Language requirements for CEP and DEBS 2009

July 28, 2009

This year on DEBS (Distributed Event-Based Systems), there is an excellent tutorial around event processing languages (EPLs). I particularly like their categorization of the different styles of EPLs: “inference rules, ECA rules, agent oriented, SQL extensions, state oriented, imperative/script based”.

I was, however, a bit lost on the ‘why‘. Why is that there are so many different styles? Why is that we need a different language for event processing to begin with? Or, more importantly, why the recent re-energization of event processing?

If I may try to address this, I believe the spike in interest around event technologies like CEP is in a large extent a reflection of two recent developments in the IT world:

  • The increase in order-of magnitude of the volume of events that are generated and thus need to be processed today.
  • The desire to process these high volume of events in real-time.

The first item is an obvious side effect of increasing bandwidth, connectivity, and computational resources in today’s world.

The second item is less obvious. As markets become global, and competition is increased, business and their processes need to adapt and change ever more quickly and efficiently, thus forcing IT to move from a offline mode of data analysis where data can be stored and then analyzed to a online mode where data in the form events need to be processed in real-time as they occur.

These two business requirements dictate, presented here in a simplistic form, the following language requirements on an EPL:

  1. To be able to cope with possibly infinite sequences of events, the programming language must provide facilities to bind the event sequence in a structure that is workable, for example, by creating windows on top of streams.
  2. To facilitate the reduction of the volume of events, the programming language must provide facilities to transform several events into a single summary event, that is, to aggregate simple events into complex events (and hence the term ‘complex event processing’)
  3. As processing is executed in real-time, “time” and temporal constraints must be a first-class citizen of the programming language.
  4. Event processing with the intent of driving business processes hardly can be done in isolation without contextual data; hence the programming language must facilitate the handling of both events and the retrieval of (static) data and easy integration of the two.
  5. Along side the previous item, finding the relationship amongst events is important; the programming language must allow the matching of complex relationship patterns, such as: conjunctions, disjunctions, and negations (e.g. Events A and B, A or B, not A); non-presence of events; temporal matching (e.g. Events A before B); and correlation. In some aspect, this is another example where several simple events are synthesized into fewer complex events.

Undoubtedly, I have left out several other important language requirements, such as expressiveness and computability, however our focus is on CEP, and hence we will ignore these, for which there is extensive literature elsewhere.

Well, having established and hopefully agreed upon the language requirements, the next step is to map these into a real language implementation, which I following attempt to do using CQL.

Requirement 1 can be achieved in CQL through the definition of windows:

SELECT * FROM stream [RANGE 10 seconds]

In this example, we bind a window of 10 seconds to an previously unbound stream of events.

Requirement 2 is supported through different ways, one of which is just the aggregation of events into a new event, as following:

SELECT avg(price) FROM stock-stream [RANGE 10 seconds] GROUP BY symbol

Per requirement 3, time is a fundamental piece of CQL, which supports both application time and system time. As one can noticed in the first example, the window of events is defined in terms of time. There are other operators that take time as input.

In requirement 4, there is a need to join the stream with an external table that provides the contextual data:

SELECT symbol, full-name FROM stock-stream [RANGE 10 seconds] as event, stock-table as data

WHERE event.symbol = data.symbol

The stock-table entity may live in a RDBMS, in a distributed cache, or in any other data-provider implementation.

Finally, requirement 5 is achieved through the match_recognize operator. The following example detects that a customer order has not been followed by a shipment within 10 seconds:

SELECT “DELAYED” as alertType, orders.orderId,
FROM salestream MATCH_RECOGNIZE (
PARTITION BY orderId

MEASURES
CustOrder.orderId AS orderId

PATTERN (CustOrder NotTheShipment*) DURATION 10 SECONDS

DEFINE
CustOrder AS (type = ‘ORDER’),
NotTheShipment AS ((NOT (eventType = ‘ SHIPMENT’)))

) AS orders

Overall, this is a simple and somewhat naive attempt to illustrate the point, but, nevertheless, I hope it provides some structure to the requirements of an event processing language.


Oracle CEP 11gR1 – official support for CQL

July 1, 2009

Today Oracle announced the release of the product Oracle CEP, version 11gR1, which I am gladly part of the team.

Oracle CEP 11gR1 is the next release after 10.3, which is the re-brand of BEA’s Weblogic Event Server.

There are several new features in this release, but the flag-stone is the inclusion of CQL (Continuous Query Language) as the default event processing language.

CQL is important in several aspects:

  • It brings us closer to converging towards a standard language for event processing
  • It is based on a solid theoretical foundation, leveraging relational calculus and extending it to include the concept of stream of events
  • Full support for pattern matching, a new stream-only operator
  • Solid engineering, including several query plan optimizations, for example, a unbounded stream that uses the insert stream operator is converted to a ‘NOW’ window using the relation-stream operator
  • Implementation abstractions, for example, a relation can be bound to either a RDBMS table or to a Coherence cache, without any query re-write.

Other features worth-mentioning are:

And more coming soon…


CEP glossary

September 11, 2008

The Event Processing Technical Society (EPTS) has updated its glossary of CEP terms.

Having a common language is indeed one of the first steps towards being able to discuss a technology.

Fortunately enough, several of the terms are already well-accepted and used. For instances, Oracle CEP (i.e. former BEA Event Server) treats the following terms as first-class citizens in its programming model: Event Source, Event Sink, Stream, Processor (a.k.a Event Processing Agent), Event Type, Event Processing Network, Event Processing Language, and Rule. This allows the user to author an event processing application by leveraging existing event processing design patterns: define the event types, define the sources of events, define the sinks of events, define the intermediate processors, connect these forming a EPN, configure the processors with EPL rules. It becomes a simple enough task if one understands the glossary.

Nevertheless, I would like to suggest one additional term: a relation. To understand relations, one has to understand streams first.

The glossary defines stream as “a linearly ordered sequence of events”. At first glance, one may think of a stream as the physical pipe, or connection, between sources and sinks. However, that is not entirely correct. Rather, an event channel, which is defined as “a conduit in which events are transmitted…” represents the idea of a connection; a stream extends this idea and adds processing semantic to it, precisely it states that a stream not only contains events, but that these events must be totally ordered.

Let’s consider an example. Consider an event type consisting of two attributes, a integer, and a string.

The following sequence is a stream ordered by the first attribute: {1, “a”}, {2, “b”}, {3, “a”}.
Conversely, the following sequence is NOT a stream: {1, “a”}, {2, “b”}, {1, “a”}. In this latter case, this sequence of events is termed a event cloud. Streams are a type of event clouds.

This may seem rather odd and arbitrary. But this additional semantic is very helpful. For one thing, it allows CEP engines to optimize. Because streams are generally unbounded, i.e. they never end, one has to define a window of time (or length) on top of a stream to be able to do any interesting computation. Having a ordered sequence allows the engine to progress the computation for a window without having to keep old values forever.

Considering the example at hand, let’s say one wants to calculate the number (i.e. count) of similar strings in the last 3 seconds, and that the first attribute provides the time-stamp of the event in seconds.

If the events were not ordered in time, then how would the engine know when it is safe enough to output the correct result?

Consider that time starts at t = 0 and the following input: {2, “a”}, {3, “a”}.

Should we wait for event {1,?} before outputting the result? Was this event lost or delayed? What if we don’t wait, and output the result, and then the event {1,?} ends up arriving afterward, should we output the result again?

As it can be noted, the conceptual model gets very complicated. Better to keep the conceptual model simple, and move this complexity elsewhere, such as an adapter that might be re-ordering events using some time-out strategy.

A “relation” is another type of event cloud. It is also a totally ordered set of events. In addition, it has an attribute that denotes whether the event is an insert, delete, or update event. These three different kind of events allows one to model a table, or more precisely, a instantaneous finite bag of tuples at some instant of time.

Consider a table whose rows contain a single column of type string.

The sequence of events {+, 1, “a”}, {+, 2, “b”} creates the table { {“a”}, {“b”} } at time t = 2.

Then the following event arrives: {-, 3, “a”}. This will result into the table being updated to { {“b”} } at time t = 3.

Keep in mind that a relation, in the context of event processing, and similarly to a stream, is still a sequence of streaming events, which pass through some event channel. However, differently than a stream, these events have additional semantic into them to allow one to represent actual finite tables.

Why is this needed? It is very common for events to be enriched with additional data. Sometimes this data is very dynamic, in which case this is modeled as a join between two streams; sometimes this data is somewhat static, changing less often, this case is better modeled as the join between a stream and a relation. If you think of it in this terms, it almost seems like a stream, seen as containing only insert events, is a sub-type of relation.

Why called it “relation”? Mainly because this is the term used by CQL, the foundation work for data stream management.

The CEP glossary does an excellent job of setting up the base model, and extending it to event patterns. With “relation“, we complete the model by embracing the data stream management piece of it.