This year on DEBS (Distributed Event-Based Systems), there is an excellent tutorial around event processing languages (EPLs). I particularly like their categorization of the different styles of EPLs: “inference rules, ECA rules, agent oriented, SQL extensions, state oriented, imperative/script based”.
I was, however, a bit lost on the ‘why‘. Why is that there are so many different styles? Why is that we need a different language for event processing to begin with? Or, more importantly, why the recent re-energization of event processing?
If I may try to address this, I believe the spike in interest around event technologies like CEP is in a large extent a reflection of two recent developments in the IT world:
- The increase in order-of magnitude of the volume of events that are generated and thus need to be processed today.
- The desire to process these high volume of events in real-time.
The first item is an obvious side effect of increasing bandwidth, connectivity, and computational resources in today’s world.
The second item is less obvious. As markets become global, and competition is increased, business and their processes need to adapt and change ever more quickly and efficiently, thus forcing IT to move from a offline mode of data analysis where data can be stored and then analyzed to a online mode where data in the form events need to be processed in real-time as they occur.
These two business requirements dictate, presented here in a simplistic form, the following language requirements on an EPL:
- To be able to cope with possibly infinite sequences of events, the programming language must provide facilities to bind the event sequence in a structure that is workable, for example, by creating windows on top of streams.
- To facilitate the reduction of the volume of events, the programming language must provide facilities to transform several events into a single summary event, that is, to aggregate simple events into complex events (and hence the term ‘complex event processing’)
- As processing is executed in real-time, “time” and temporal constraints must be a first-class citizen of the programming language.
- Event processing with the intent of driving business processes hardly can be done in isolation without contextual data; hence the programming language must facilitate the handling of both events and the retrieval of (static) data and easy integration of the two.
- Along side the previous item, finding the relationship amongst events is important; the programming language must allow the matching of complex relationship patterns, such as: conjunctions, disjunctions, and negations (e.g. Events A and B, A or B, not A); non-presence of events; temporal matching (e.g. Events A before B); and correlation. In some aspect, this is another example where several simple events are synthesized into fewer complex events.
Undoubtedly, I have left out several other important language requirements, such as expressiveness and computability, however our focus is on CEP, and hence we will ignore these, for which there is extensive literature elsewhere.
Well, having established and hopefully agreed upon the language requirements, the next step is to map these into a real language implementation, which I following attempt to do using CQL.
Requirement 1 can be achieved in CQL through the definition of windows:
SELECT * FROM stream [RANGE 10 seconds]
In this example, we bind a window of 10 seconds to an previously unbound stream of events.
Requirement 2 is supported through different ways, one of which is just the aggregation of events into a new event, as following:
SELECT avg(price) FROM stock-stream [RANGE 10 seconds] GROUP BY symbol
Per requirement 3, time is a fundamental piece of CQL, which supports both application time and system time. As one can noticed in the first example, the window of events is defined in terms of time. There are other operators that take time as input.
In requirement 4, there is a need to join the stream with an external table that provides the contextual data:
SELECT symbol, full-name FROM stock-stream [RANGE 10 seconds] as event, stock-table as data
WHERE event.symbol = data.symbol
The stock-table entity may live in a RDBMS, in a distributed cache, or in any other data-provider implementation.
Finally, requirement 5 is achieved through the match_recognize operator. The following example detects that a customer order has not been followed by a shipment within 10 seconds:
SELECT “DELAYED” as alertType, orders.orderId,
FROM salestream MATCH_RECOGNIZE (
PARTITION BY orderId
CustOrder.orderId AS orderId
PATTERN (CustOrder NotTheShipment*) DURATION 10 SECONDS
CustOrder AS (type = ‘ORDER’),
NotTheShipment AS ((NOT (eventType = ‘ SHIPMENT’)))
) AS orders
Overall, this is a simple and somewhat naive attempt to illustrate the point, but, nevertheless, I hope it provides some structure to the requirements of an event processing language.