Time can be a subtle thing to understand in CEP systems.
Following, we have a scenario that yields different results depending on how one interprets time.
First, let’s define the use-case:
Consider a stream of events that has a rate of one event every 500 milliseconds. The event contains a single value property of type int. We want to calculate the average of this value property for the last 1 second of events. Note how simple is the use-case.
Is the specification of this use-case complete? Not yet, so far we have described the input, and the event processing (EP) function, but we have not specified when to output. Let’s do so: we want to output every time the calculated average value changes.
As an illustration, the following CQL implements this EP function:
ISTREAM( SELECT AVG(value) AS average FROM stream [RANGE 1 second] )
The final question is: how should we interpret time? Say we let the CEP system timestamp the events as they arrive using the wall clock (i.e. CPU clock) time. We shall call this the system-timestamped timing model.
Table 1 shows the output of this use-case for a set of input events when applying the system-timestamped model:
What’s particularly interesting in this scenario is the output event o4. A CEP system that supports the system-timestamped model can progress time as the wall clock progresses. Let’s say that our system has a heart-beat of 300 milliseconds, what this means is that at time 1300 milliseconds (i.e. i3 + heart-beat) the CEP system is able to automatically update the stream window by expiring the event i1. Note that this only happens when the stream window is full and thus events can be expired.
Next, let’s assume that the time is defined by the application itself, and this is done by including a timestamp property in the event. Let’s look what happens when we input the same set of events at exactly the same time as if we were using the wall clock time:
Interesting enough, now we only get four output events, that is, event o4 is missing. When time is defined by the application, the CEP system itself does not know how to progress time, in other words, even though the wall clock may have progressed several minutes, the application time may not have changed at all. What this means is that the CEP system will only be able to determine that time has moved when it receives a new input event with an updated timestamp property. Hence we don’t get a situation like in the previous case where the CEP system itself was able to expire events automatically.
In summary, in this simple example we went through two different timing models, system-timestamped and application timestamped. Timing models are very important in CEP, as it allows great flexibility, however you must be aware of the caveats.