CEP is commonly referenced as a continuous query engine.
What exactly is a continuous query engine? How is it different than a non-continuous query engine, such as a database query engine?
First and foremost, CEP is not passively waiting for a customer request to execute work. Instead, the customer registers CEP queries into the engine and the engine executes these queries, even when there are no input events to be processed. Contrast this with the case of a database system, which waits for a DBA to send a query before processing the input data. This is the reason why CEP is sometimes called an inverted database:
In a database, the queries change often, and the input (data) changes less often. In CEP, the input (event) changes often, and the queries changes less often.
Let’s consider different CEP scenarios, where in each one “continuous” has a slightly different meaning:
- Filtering
In the case of queries performing plain filtering over each single input event, the CEP engine is executing work only when a new event arrives, and otherwise it is idle. If the flow of input events is high, one has the impression that the engine is continuously running.
- Time-windows
In the case of queries that make use of time-windows, for example, a query that calculates the average price of stocks received in the last 10 seconds, then the CEP engine must purge events out of the time-window as time progresses. In this sense, the engine is actually doing work even if there are no input events.
- User-defined functions
Consider a query that makes use of a function that is not idempotent, for example, a function that returns the current time:
SELECT getCurrentTime() as time FROM stream
This is an interesting case, should the CEP engine output a new event only when a new input event arrives in the stream, or should the CEP engine output a new event every time getCurrentTime() returns a new value? If we are true to the continuous-aspect of a continuous query engine, then it seems the CEP engine should do the latter case, that is, output a new event every time progression.
Note that the output event does not depend upon the input event, such it is the case of the following example:
SELECT getCurrentTime(), stream.property FROM stream [NOW]
- State-space
Finally, let’s consider the state-space used by the CEP engines, or, for that matter, by any continuous query engine. The state-space is the set of variables used internally in a system. If the processing is continuous, then one would assume that the variables being used while processing the input values is also of a continuous nature, that is, composed of a vector of real (or complex) numbers, and calculated through differential equations. Instead, this ends up not being the case, the state-space is more likely than not a set of discreet numbers, like integers.
So, as it stands today, most continuous query engines have discreet state-spaces instead of continuous state-spaces.