There has been a lot of discussions on the differences between CEP/ESP and rules engines, specifically regarding RETE implementations. I particularly like this entry from Tim Bass on the subject.
In my opinion, the very fact that we are comparing CEP/ESP and rules engines is a sign that these subjects are not yet very well understood.
Partly, the reason for the confusion is that CEP/ESP, rules engines, and RETE are at different levels of abstractions.
CEP/ESP is a technology. It is a collection of techniques, methodologies, and implementation procedures intended on solving a common set of problems of a domain, namely, in the case of CEP/ESP, (loosely speaking) to achieve online processing of streaming data (i.e. events) to support, among other things, real-time business decision.
Although CEP and ESP have different definitions, for the purpose of this article I will use both terms interchangeably.
There are several aspects to CEP: visualization of events, storage of events, programming model for the development of CEP applications, rule (or query) governance, the runtime processing engine, etc.
With respect to the runtime processing engine, there are several implementations options. However, before considering an engine implementation, one must first consider what language, or what language paradigm, should be surfaced to the user so that the user can program to this engine.
One option is to take the well-known imperative programming language paradigm and use any one of the popular procedural or object-oriented languages such as Java as the event processing language of choice that is surfaced to the user. For example, this is the approach taken when one implements their CEP application with hard-coded rules directly in Java.
Another option is to use a more data-driven approach. Considering that one is dealing with events, which is a type or extension of data, this seems an appropriate choice.
Within the realm of data-driven approaches, one option is to use SQL. SQL, needless to say, has proven to be a very convenient and pervasive domain-specific language for dealing with data, which allows one to leverage existing skill-sets. Another common option is to use logic programming. Note that there is an equally intriguing discussion around CEP and SQL currently going on.
Logic programming is a very interesting, but long subject. In a logic programming language, programs consist of a set of statements that specify what is true about a desired goal. This is different than in imperative languages (e.g. Java), where programs specify the set of statements that must be executed to achieve the desired goal.
More specifically, in logic programming, programs contain only logical statements. Logical statements are statements that are either true or false. Logical statements that are always true are called axioms. The role of a logic program is to infer new logical statements called theorems from the axioms through some automated algorithm. The automated algorithm used by the program is called its inference rules. Theorems are the goals that represent the desired results.
Pure logic programming is very powerful indeed. In pure logic programming, at no time one specifies how to derive the goals, that is, there are no control statements (e.g. if-then-else). Pure logic programming is also very hard to implement. Hence people came with simpler models that support sub-sets of the full logic programming paradigm. One of these is called First Order Predicate Calculus.
In First Order Predicate Calculus, logical statements are formed of constants, variables and their quantifiers (e.g. for all values, for any value), predicates, and connectives. Much of the expressiveness and power of First Order Predicate Calculus comes from the connectives. Common connectives are the logic operators AND, OR, NOT. Additional connectives are implication and equivalence. To say that A implies B, means that B is true when A is also true.
It turns out that even the restricted sub-set First Order Predicate Calculus is hard to implement. Thus people have further simplified it in attempts to facilitate implementations; Horn Clauses is the most commonly used sub-set of First Order Predicate Calculus. Horn Clauses do not support the specification of variable quantifiers, and of the equivalence operator, among other limitations. Prolog, probably the most widely used logic programming language, supports Horn Clauses.
Loosely speaking, a rule-based system or rule-based engine is an implementation of a logic programming language, usually based upon Horn Clauses, with some few procedural extensions. These systems are sometimes called production systems. In a production system, logical statements that are constants (i.e. always true or false) are called assertions, and compose what’s called the working memory. Logical statements that make use of the implication connective, that is, follow a if-then structure are called productions, and make up the production memory.
Rules engine execute by following a recognize-act algorithm; that is, as new logical statements are made available (i.e. inputted by the user), the engine matches (i.e. recognizes) the left-hand side of all productions against the contents of the working memory. This process results into a set of production instantiations. Next the engine executes (i.e. acts) the right-hand side, called actions, of the matched instantiated productions. The actions may result in new content in the working memory. Changes to the working memory causes the engine to re-enter the recognize-act cycle, which continues until no more matches occur.
One procedural extension to rules engines is to be able to directly manipulate the content of the working memory. This is done in the right-hand-side (i.e. actions) of the productions, where one may create, delete, and modify assertions present in the working memory.
Matching through thousands of productions against thousands of assertions can be quite expensive. A naive approach would cost O(productions * assertions). Matching can take around 90% of the processing time of a rules engine.
So let’s consider a way of optimizing this. Essentially what the engine is trying to do is to find out which assertions match against the left-hand-side of the productions, in another word, the end goal is to find a set of productions. What if we create a graph of nodes, where the nodes represent the conditions we are trying to match, and as we traverse the graph matching these nodes we eventually reach a terminal node representing the production for which the conditions apply?
For example, consider the following four productions P1, P2, P3, P4:
P1: if attr1 = ‘value1’ and attr2 = ‘value2’ then action1
P2: if attr1 = ‘value1’ and attr2 = ‘value3’ then action2
P3: if attr1 = ‘value4’ and attr2 = ‘value5’ then action3
P4: if attr1 = ‘value6’ and attr2 = ‘value7’ then action4
The corresponding matching graph is:
Now consider that assertion A is inputted:
A1(attr1 = ‘value1’, attr2 = ‘value2’)
The first step is to match for attr1, which will take the engine to the child node n2, and then to match attr2 to the terminal node P1, which is our goal. This has a total of 4 comparisons in the worst-case. A (very) naive implementation would take 8 comparisons in the worst-case.
Although over-simplified, this is the general idea of the RETE (i.e. Latin for network) matching algorithm invented by Dr. Charles Forgy. RETE also has additional considerations for optimizing the handling of the binding of variables, of joins, and techniques for keeping the network of matching nodes up-to-date, among other things.
RETE is a very efficient algorithm for performing matching in productions systems. Does it look as if it is generic enough in its entirely or parts of it to be used in different contexts? Yes, it certainly does to me. In fact, we use some of the ideas, with combination of hash functions, in the implementation of some aspects of our CEP engine, which is based upon SQL and not on a logical programming language.
To put all of these in perspective, let’s look at an example. Let’s say you want to find out if a certain stock has changed its value more than 20% when compared to a previous value in the past. Consider that a stock is defined as:
Stock: SYMBOL, VALUE, TIMESTAMP
First, let’s start with a logic programming language. I will use Prolog in this case:
stockUp(X) :- stock(X, Price1, Time1), stock(X, Price2, Time2),
Time2 > Time1, Price2 > (Price1 * 1.20).
If we feed the following assertions into the Prolog’s working memory:
stock(aaa, 10, 0).
stock(bbb, 15, 1).
stock(ccc, 10, 2).
stock(aaa, 12, 3).
stock(bbb, 14, 4).
stock(ccc, 13, 5).
We will get the following result:
stockUp(X = ccc)
Which means that the last stock assertion (i.e. ‘stock(ccc,13,5)’) generates the creation of the new assertion ‘stockUp‘ for symbol ‘ccc’.
Next, let’s try with BEA EPL, which mostly follows the SQL idiom:
INSERT INTO StockUp
SELECT a.symbol FROM Stock a, Stock b RETAIN ALL EVENTS
WHERE a.symbol = b.symbol and (a.price > b.price * 1.20)
This EPL statement provides the same result. However, in this case, we will be feeding the engine with Java Bean events rather than assertions. So, instead of creating the assertion:
stock(aaa, 10, 0)
We will create the Java Bean:
new Stock(“aaa”, 10.0)
It is interesting how events and assertions map seamlessly to each other, at least for this simple scenario. Intuitively, this seems to make sense to me, as both are conceptually immutable and represent some known aspect of the problem domain.
Note how we specify RETAIN ALL EVENTS in the EPL query. Using rule-based systems as an analogy, the RETAIN clause provides a declarative way of managing the CEP engine’s working memory. In the case of this example, it is doing exactly the same thing as the Prolog program, that is, keeping all assertions forever, which is probably not a realistic situation.
Consider this next EPL statement:
INSERT INTO StockUp
SELECT symbol FROM Stock
RETAIN 10 EVENTS WITH LARGEST timestamp PARTITION BY symbol
WHERE price > (prev(1,price) * 1.20)
In the first EPL query, we were considering that the stock events were ordered in non-decreasing time within the stream. This is a fair assumption to make considering that we are dealing with streams executing within an underlying stream infrastructure; whereas we couldn’t really make this assumption with regards to logical statements, and hence the need of explicitly including the timestamp in the Prolog predicate clause. Nevertheless, for sake of comparison, let’s assume that the events are not ordered, which means that the EPL query itself has to take the ordering into account. This is done with the WITH LARGEST timestamp clause. This clause will keep the stock stream in ascending order of timestamp; in my opinion WITH LARGEST timestamp is more declarative than the logical counter-part timestamp2 > timestamp1.
Likewise, note how the second EPL statement does need to join on symbol, which is explicitly done by both the logical statement ‘stock(X, Price1, Time1), stock(X, Price2, Time2)‘ and by the first EPL statement ‘a.symbol = b.symbol‘. Instead, the clause PARTITION BY symbol will group the stocks by symbol and allow one to compare (in this case using the previous function) within just that group.
Needless to say, this second EPL query provides the same result as of the first one, but in a more declarative manner. The one semantic difference between these two is that the 2nd query only considers 10 events at a time (i.e. RETAIN 10 EVENTS), rather than all events (i.e. RETAIN ALL EVENTS). In another word, we are declaratively restricting the working memory to 10 events in this case.
Ok, so one may argue that rule-based systems are better suited to handle inference, which my simple example fails to demonstrate.
That is a valid point, so let’s consider the scenario where stocks are part of a cluster that moves together, that is, if several other stocks within the same cluster go up, it is likely that the remaining ones within the cluster will also go up in the near future.
A simple production rule for this scenario is:
stockUp(aaa) :- stockUp(bbb), stockUp(ccc)
If one creates the following new assertion:
stock(bbb, 25, 6).
One will get the following result, which is expected:
stockUp(X = bbb)
And, in addition, the engine would infer that:
stockUp(X = aaa)
Can we do something similar with EPL? Certainly, here is the query:
INSERT INTO StockUp
SELECT ‘aaa’ AS symbol
FROM StockUp a, StockUp b RETAIN 10 MINUTES
WHERE a.symbol = ‘bbb’ AND b.symbol = ‘ccc’
Again, it is expressed in terms of SQL joins, which arguably is a less natural form of expressing inference, but nonetheless correct.
Thankfully, EPL also supports a pattern matching idiom, in contrast to just a SQL idiom, which, in my opinion, provides better expressiveness for describing inference:
INSERT INTO StockUp
SELECT ‘aaa’ AS symbol
MATCHING StockUp(symbol = ‘bbb’) AND StockUp(symbol = ‘ccc’)
RETAIN 10 MINUTES
So which language is better? SQL-based languages? Rule-based languages?
In my opinion, the discussion around CEP technology is larger than the discussion around which language to use for its engine, and definitely larger than discussing which matching algorithm, RETE or otherwise, this engine should be built upon.
When discussing the CEP technology, I find it important to focus on use-cases.
Regarding programming languages, there is a place for more than one language in CEP, and, equally, one does not want to be restricted by any one language in particular.
“Even though I am bias towards rule engines, one thing is clear to me. There is no single language that will satisfy everyone. From the perspective of writing rule engines, as long as the engine supports enough FOL concepts, compiling a rule in some language to the execution form should be straight forward. With the extensions to RETE I propose in my blog, I think RETE can easily support CEP/ESP use cases. In many ways, CEP/ESP only covers a fraction of what Temporal Logic covers, so the key is choosing which temporal logic bits a rule engine should support. My hope is that both industries will continue to mature and learn from each other. If that happens, the technology “should” improve and developer’s lives will get easier. That might be a pipe dream. peter lin”
Posted by: woolfel on August 14, 2007 at 7:01 PM
“Hi Alex,
Thanks for calling out my blog post on CEP and rules-engines. Peter Lin makes this comment above, “I think RETE can easily support CEP/ESP use cases.”
This is simply not the case, in my opinion (and blog post). CEP must support a wide range of analytics, not just rules. These analytics should be plug-and-play and easily integrated into a larger event processing model. Important analytics for CEP include Bayesian Networks, Dempster-Shafer, Markov, Neural Nets, Kalman Filters, to name a few that come to mind.
So, it is not simply a question of “language”. It is a question of the best analytic for the problem at hand. The question of language should be considered after the choice of the best analytics for the job.
Keep up the good work!
Yours faithfully, Tim
The Complex Event Processing Blog”
Posted by: thecepblog on August 19, 2007 at 6:18 PM
“Hi Tim, I don’t understand the statement “best analytic for the problem at hand.” Most mature production rule system support First Order Logic, which means any kind of analytics can be expressed. Having worked on real-time systems in the financial sector, I know first hand a mature RETE rule engine like JESS, CLIPS, and JRules can express any analytics you want. The question for me, “how much work is it to express it and how much customization is required.” Take CLIPS for example, it doesn’t provide any analytic functions out of the box, so the user has to do some extra work. In some cases, it could mean a lot of custom functions, if the analytics are complex. Expressing things like aggregations is easy to do with CLIPS language, if the developer has the skills. The real problem here is that not many people have a deep understanding of RETE rule engines or production rule languages. By adding built-in functions for analytics and adding extensions to RETE, expressing analytics easier. What a rule engine gives you is an optimized query plan (ala RETE), which handles incremental changes efficiently. I agree there are many other forms of discrimination networks and algorithms that may be a good fit. I know there are existing financial systems that use ART, OPSJ and CLIPS for real-time event processing. Those outside the rule industry aren’t aware of this, but it is well established. peter lin”
Posted by: woolfel on August 22, 2007 at 1:55 PM
“Hi Peter, We subsequently mashed this up over in my neighborhood, in this thread: Rules Engines and Bayes’ Theorem Cheers! Yours faithfully, Tim”
Posted by: thecepblog on September 7, 2007 at 6:01 PM
“Hi, Alex, i know that RETE mantains a memory of the facts for each alpha node, if we have a huge domain it must mantain all the RETE graph in memory, then all the facts that are in the domain must stay in memory. Is it true?”
Posted by: geniodella on March 31, 2008 at 5:26 AM
“I missed this post at the appropriate time. There are a few points I’d like to make, though. First, you say that “loosely speaking, a rule-based system or rule-based engine is…usually based on Horn Clauses”. The emphasis must be on the term ‘loosely’! Production rules lack rigorous formalism and are not constrained to Horn Clauses. For example, they commonly combine multiple conclusions. The mapping of production rules onto formal logical constructs tends to be rather imprecise. Your assertion that the RHS of production rules are procedural is problematic. The reality is rather more complex. The RHS will always exhibit a degree of declaratively. It is responsible for asserting, modifying and retracting facts, and by the very nature of how the engine works, the order in which these actions are done, relative to each other, is not ‘officially’ relevant. However, production systems inevitably suffer from side effects. Side effects are the enemy of all that is good in the declarative world. You can only meaningfully modify (retract and reassert) a fact after you have changed the values of attributes associated with that fact, and this implies a procedural approach. The side effect, here, is the change to working memory. It is also possible that the order in which you undertake operations on working memory may result in ‘arbitrary’ side effects deep within the engine which affect engine performance and the reasoning strategy. Engines exhibit a range of different implementation approaches, but must generally ensure, in any circumstances, that all actions on a fired rule are executed, and not short-circuited by the forward-chaining cycle. A simple approach is to internally re-order individual action statements to ensure that each assertion and retraction is deferred until all other actions have been completed. Issues concerning the RHS of production rules provide a good example of how these constructs lack high formalism. In the project I am currently working on, we have two main rule sets. In both, we limit the RHS to use pre-defined ‘vocabulary’ definitions (in essence, we layer a DSL on top of our rule sets). For one vocabulary, used in one of the rule sets, the RHS of each rule is completely declarative – the order in which we declare the actions (each of which represents a policy assertion) has no logical effect on the outcome. In the other, the RHS of each rule is largely (though not entirely) procedural. We have to undertake certain actions in a strict order in order to cater for side effects in an underlying object model which our actions operate upon. Production rules are pragmatic in nature, and loose enough to cater for a wide variety of different processing requirements in real-world applications. Matching can easily account for a good deal more than 90% of the processing time! However, this highlights a point that is often lost when considering production systems. Relative costs are a function, primarily, of the rule set logic, rather than the underlying engine. This is why no single benchmark can realistically illustrate the true performance characteristics of a Rete engine. Generalised statements about costs in a Rete engine inevitably apply only to a subset of rule sets. For example, it is true that memory usage can be very high. This is because Rete trades space- against time-complexity when undertaking highly combinatorial processing. It uses a lot of memory in order to maximise performance across multiple data sets when undertaking forward-chaining. However, if a rule set does not perform joins, the memory usage is minimal (no partial match retention), and if the conditions are fairly simple, and the engine is well optimised, you may even get O(1) across arbitrarily-sized rule sets. The techniques for achieving such high performance and scalability in a Rete engine are well-understood, and clearly ideal for many event stream processing scenarios. Sorry if this reads as a list of disagreements. In fact, I strongly agree with your conclusions. As you say, CEP encompasses a much larger realm than discussions about the merits/demerits of different pattern matching algorithms. I would make one plea, though, which I think is in line with your conclusions. The CEP community should always avoid the temptation of casually dismissing the experience of three decades of work on production systems. The Rete crowd understand a thing or two about industrial-strength pattern matching.”
Posted by: CharlesYoung on June 13, 2008 at 7:18 AM
“…sorry. I put line breaks in before posting, but they seem to have been stripped out.”
Posted by: CharlesYoung on June 13, 2008 at 7:22 AM
“Hi Charles, Thanks for the comment, very illustrative and educational. I don’t disagree with you, I do think that one of the roles of commercial systems is to bend the underlying formalism and achieve better results for certain cases. In a (already not so) succinct blog though, I do feel the formalism helps with the understanding. Either way, I do agree production systems have gone through a long way and there is much to be learn. Do you have more documentation on your design/system elsewhere? Cheers,”
Posted by: aalves on June 25, 2008