Event Processing Reference Architecture at DEBS 2010

July 29, 2010

Recently, the EPTS architecture working group, which I am part of, presented its reference architecture for event processing at DEBS 2010, which was realized at Cambridge in July 12th. The presentation can be found at SlideShare.

The presentation first highlights general concepts around event processing and reference architecture models, the latter based upon IEEE. This is needed for us to be able to normalize the architectures of the different vendors and players to be presented into a cohesive set. Following, individual architectures were presented from University of Trento (Themis Palpanas), TIBCO (Paul Vincent), Oracle (myself), and IBM (Catherine Moxey).

Following, I include the functional view for Oracle’s EP reference architecture:

The pattern for each architecture presentation is to describe different views of the system, in particular a conceptual view, a logical view, a functional view, and a deployment view. Finally, a common event-processing use-case, specifically the Fast-Flower-Delivery use-case, was selected to be mapped to each architecture, thus showing how each architecture models and solves this same problem.

Having exposed the different architectures, we then collide all into a single reference model, which becomes the EPTS reference architecture for Event Processing.

What are the next steps?

We need to further select event processing use-cases and to continue applying them to the reference architecture, hopefully fine-tuning it and expanding it. In particular, I feel we should tackle some distributed CEP scenarios, in an attempt to improve our deployment models and validate the logical and functional views.

Furthermore, I should also mention that at Oracle we are also working on a general EDA architecture that collaborates with SOA. More on this subject later.


CEP glossary

September 11, 2008

The Event Processing Technical Society (EPTS) has updated its glossary of CEP terms.

Having a common language is indeed one of the first steps towards being able to discuss a technology.

Fortunately enough, several of the terms are already well-accepted and used. For instances, Oracle CEP (i.e. former BEA Event Server) treats the following terms as first-class citizens in its programming model: Event Source, Event Sink, Stream, Processor (a.k.a Event Processing Agent), Event Type, Event Processing Network, Event Processing Language, and Rule. This allows the user to author an event processing application by leveraging existing event processing design patterns: define the event types, define the sources of events, define the sinks of events, define the intermediate processors, connect these forming a EPN, configure the processors with EPL rules. It becomes a simple enough task if one understands the glossary.

Nevertheless, I would like to suggest one additional term: a relation. To understand relations, one has to understand streams first.

The glossary defines stream as “a linearly ordered sequence of events”. At first glance, one may think of a stream as the physical pipe, or connection, between sources and sinks. However, that is not entirely correct. Rather, an event channel, which is defined as “a conduit in which¬†events are transmitted…” represents the idea of a connection; a stream extends this idea and adds processing semantic to it, precisely it states that a stream not only contains events, but that these events must be totally ordered.

Let’s consider an example. Consider an event type consisting of two attributes, a integer, and a string.

The following sequence is a stream ordered by the first attribute: {1, “a”}, {2, “b”}, {3, “a”}.
Conversely, the following sequence is NOT a stream: {1, “a”}, {2, “b”}, {1, “a”}. In this latter case, this sequence of events is termed a event cloud. Streams are a type of event clouds.

This may seem rather odd and arbitrary. But this additional semantic is very helpful. For one thing, it allows CEP engines to optimize. Because streams are generally unbounded, i.e. they never end, one has to define a window of time (or length) on top of a stream to be able to do any interesting computation. Having a ordered sequence allows the engine to progress the computation for a window without having to keep old values forever.

Considering the example at hand, let’s say one wants to calculate the number (i.e. count) of similar strings in the last 3 seconds, and that the first attribute provides the time-stamp of the event in seconds.

If the events were not ordered in time, then how would the engine know when it is safe enough to output the correct result?

Consider that time starts at t = 0 and the following input: {2, “a”}, {3, “a”}.

Should we wait for event {1,?} before outputting the result? Was this event lost or delayed? What if we don’t wait, and output the result, and then the event {1,?} ends up arriving afterward, should we output the result again?

As it can be noted, the conceptual model gets very complicated. Better to keep the conceptual model simple, and move this complexity elsewhere, such as an adapter that might be re-ordering events using some time-out strategy.

A “relation” is another type of event cloud. It is also a totally ordered set of events. In addition, it has an attribute that denotes whether the event is an insert, delete, or update event. These three different kind of events allows one to model a table, or more precisely, a instantaneous finite bag of tuples at some instant of time.

Consider a table whose rows contain a single column of type string.

The sequence of events {+, 1, “a”}, {+, 2, “b”} creates the table { {“a”}, {“b”} } at time t = 2.

Then the following event arrives: {-, 3, “a”}. This will result into the table being updated to { {“b”} } at time t = 3.

Keep in mind that a relation, in the context of event processing, and similarly to a stream, is still a sequence of streaming events, which pass through some event channel. However, differently than a stream, these events have additional semantic into them to allow one to represent actual finite tables.

Why is this needed? It is very common for events to be enriched with additional data. Sometimes this data is very dynamic, in which case this is modeled as a join between two streams; sometimes this data is somewhat static, changing less often, this case is better modeled as the join between a stream and a relation. If you think of it in this terms, it almost seems like a stream, seen as containing only insert events, is a sub-type of relation.

Why called it “relation”? Mainly because this is the term used by CQL, the foundation work for data stream management.

The CEP glossary does an excellent job of setting up the base model, and extending it to event patterns. With “relation“, we complete the model by embracing the data stream management piece of it.