A guide to rules engines for IoT: Stream Processing Engines

What are stream processing engines?

Stream processing is the processing of data in motion―in other words, computing on data directly as it is produced or received (as opposed to map-reduce databases such as Hadoop, which process data at rest).

Before stream processing emerged as a standard for processing continuous datasets, these streams of data were often stored in a database, a file system, or some other form of mass storage. Applications would then query the stored data or compute over the data as needed. One notable downside of this approach―broadly referred to as batch processing―is the latency between the creation of data and the use of data for analysis or action.

In most stream processing engines users have to write code to create operators, wire them up in a graph and run them. Then the engine runs the graph in parallel.

‍

What are some examples of stream processing engines used in the IoT domain?

Stream processing engines have a narrow usage in IoT – for runtime processing of IoT data streams. They are not designed as a generic rules engine and e.g. cannot actuate back on devices directly.

Some of the most common stream processing engines are Apache Storm, Flink, Samza etc.

Upon receiving an event from a data stream, a stream processing application reacts to the event immediately. The application might trigger an action, update an aggregate, or “remember” the event for future use. Stream processing computations can also handle multiple data streams jointly, and each computation over the event data stream may produce other event data streams.

Stream processing rules engines are typically used for applications such as algorithmic trading, market data analytics, network monitoring, surveillance, e-fraud detection and prevention, clickstream analytics and real-time compliance (anti-money laundering).

‍

Can you model complex logic with stream processing engines?

No high order logic constructions (combining multiple non-binary outcomes, majority voting, conditional executions) are possible with stream rules engines. However, developers can run StreamSQL on top of the datastreams, where simple thresholds together with aggregation across all streams or certain stream subsets can bring great value for some use cases.

‍

How well can stream processing engines deal with the time dimension?

Stream processing engines cannot cope with synchronous and asynchronous events in the same rule. This means that we can’t intercept the stream data and at the same moment call an external API service, while executing the rule. Stream processing engines are designed to focus on the high throughput stream execution, which would, for any API call that has a big round-trip delay for a given event, simply break the processing pipeline.

Still, stream processing engines have a very powerful query language – StreamSQL. StreamSQL queries over streams are generally “continuous”, executing for long periods of time and returning incremental results. These operations include: Selecting from a stream, Stream-Relation Joins, Union and Merge and Windowing and Aggregation operations.

‍

Are stream processing engines explainable?

Unless you are a developer and familiar with Stream SQL, it is impossible as a user to understand the behaviour of any particular rule. We can argue the same for any typical SQL-based solution.

‍

Are stream processing engines adaptable?

API extensions and overall flexibility are weak points of these rules engines. Stream processing engines are data processing pipelines, not meant to be directly integrated with third-party API systems.

‍

How easy is it to operate stream processing engines?

In many IoT stream processing use cases, stream processing is used for global threshold crossing (e.g. send an alarm if temperature of any event is above a threshold) or aggregations (e.g. average temperature in a given region) but any more complicated calculation or per device threshold crossing is extremely hard to achieve. This is why templating, updating rules per device or version updates are very difficult.

‍

Are stream processing engines scalable?

When it comes to real-time large-volume data processing capabilities, nothing can beat stream processing engines, they are the most scalable engines out there today.

‍

This is an excerpt from our latest Benchmark Report on Rules Engines used in the IoT domain. You can have a look at a summary of the results or download the full report over here.

‍

Contact a Waylay Expert

‍

See the video below to see the combination of Waylay and FLS VISITOUR in action:

‍

What’s next?

‍

Autonomous service operations is getting supercharged by the advent of smart synthetic software agents, powered by Large Language Models (LLMs). These synthetic agents will assist human service agents to increase capacity and reduce tedious manual work, like root cause analysis of asset performance issues, updating work plans to deal with impending asset shut downs, etc.

‍

LLM technologies have matured enough to couple automated asset health monitoring with autonomous field job scheduling to improve asset uptime and Service Level Agreement adherence. Waylay’s analytics and orchestration platform can serve various agentic LLM applications for autonomous service operations that leverages the FLS VISITOUR scheduling engine to optimize the field force load and reduce wasted travel hours. The result is faster preventive asset maintenance activities, less human error during scheduling and an overall better end customer experience.

‍

Want to know more? Please get in touch with us here

A guide to rules engines for IoT: Stream Processing Engines