What are flow processing engines?
Flow based programming is a programming paradigm that defines applications as networks of “black box” processes. These processes, a.k.a functions, are represented as nodes that exchange data across predefined connections by message passing. The nodes can be reconnected endlessly to form different applications without having to change their associated functions.
Flow based programming (FBP) is thus naturally “component-oriented”. Some of the benefits of FBP are:
- Change of connection wiring without rewriting components.
- Inherently concurrent – suited for the multi-core CPU world.
What are some examples of flow processing engines?
Yahoo! Pipes and Node-RED are two examples of rules engines built using flow based programming. Flow based programming has become even more popular with the introduction of “serverless” computing, where cloud applications can be built by chaining functions.
IBM’s OpenWhisk is an example of flow based programming by chaining cloud functions (which IBM calls actions). Another serverless orchestration approach (such as AWS step functions) is based on Finite State Machine rules engines.
Can flow processing engines model complex logic?
Flow based programming has no notion of states and state transitions. Combining multiple non-binary outcomes of functions (observations) in the rule is possible, but must be coded in every function where it is applied. That also implies that you have to branch at every function where you need to model a multiple-choice outcome. This leads to extremely busy flow graphs that are hard to follow, especially since logic is expressed both in the functions themselves and in their “connectors” – path executions. These connectors somehow suggest not only the information flow but also the decisions that are being taken.
Similar to decision trees, such an approach for modelling suffers from an exponential growth of the number of nodes, as the complexity of the logic increases. What makes the matter even worse is that, unlike in decision trees, we cannot track the function outcomes as states. There is no better illustration of this drawback than to look at a slightly more complex flow being implemented using Node-RED, and count the number of nodes and connectors. It is not unusual to have simple use cases designed by node-RED with 30 or 40 nodes and connectors, which can hardly even fit on one screen.
Majority voting in flow engines is possible only if we introduce the concept of merging the outputs of different nodes into a separate merge node. Even so, it’s still problematic, as it requires to code majority rules within the function of that merge node.
Can flow processing engines model time?
Flow engines can barely deal with any aspect of the time dimension, since FBP is by design a stateless rules engine. In some limited use cases (which can hardly scale) you can merge streams within a time window.
Are flow processing engines explainable?
Some of the things that make a rule engine explainable are:
- The intent of the rule should be clear to all users, developers and business owners alike
- There should be a compact representation of logic
- The engine should have simulation and debugging capabilities bnoth during design time and at runtime
For simple use cases, a flow based data stream representation feels natural, at least from the perspective of the information flow. But any attempt to create complex logic using FPB makes validating the intended logic very difficult.
Having said that, understanding which decisions are taken by looking at the flow graph is a very difficult task. The main reason for this is that the logic representation is not compact and the validation of the rules often requires streaming test data, followed by the validation of the function logs across all pipelines.
The logic is split between the flow pathways (as data travels between processing nodes) and the payload processing in each node, which might lead to different paths being taken after that processing node. Hence debugging and rules validation becomes a very tedious and error prone process. Moreover, we are never sure that all corner cases (the outputs as decisions from different inputs) are covered by a particular rule expressed using FBP – it looks almost as FBP based rules validation is an NP-hard problem.
Are flow processing engines adaptable?
Flow based programming engines have reusable black box nodes (functions). However, a partial update of any particular rule is nevertheless difficult and risky because this usually implies major changes to the graph and revalidation of the rules.. In a way, the main reason for this is that for most rules engines, and for FBP in particular, there is a high correlation between explainability and flexibility. Flow based rules engines are easy to extend with third-party services and extensibility is achieved in an elegant way.
Are flow processing engines easy to operate?
Templating is very difficult to achieve, since special care needs to be taken when handling payload transformations that happen as payloads are passed between different processing nodes. Also, thresholds and branching logic are part of the same payloads processing flow, making it very hard to abstract this logic. It’s for this same reason that bulk upgrades are error prone and risky.
Are flow processing engines scalable?
Flow based programming engines are inherently concurrent since they have to distribute functional computations. They are also stateless, which means that the rules engine only needs to keep track of the current execution and further actions that need to be executed. On the other hand, if merging multiple outputs of different nodes is required in one rule, or when decision branching is introduced with different path executions, the rules engine will need to keep the snapshot (scope) of the rules execution somewhere.
Using Node-RED for IoT application development
Node-RED is today very popular in the maker community and the de-facto tool in the gateways of many industrial vendors. This has to do with its creators decision to let different protocol streams come directly into nodes as input data events. This was done deliberately in order to simplify protocol termination and to allow payload normalization being performed within node-RED.But it’s a decision that acts as a double-edged sword.
On the one hand, this means that protocol-dependent data streams can be implemented by any third-party and immediately used within the node-RED environment.And as protocol transformation and payload normalization are very important in IoT deployments, node-RED can be very valuable for edge deployments. But on the other hand,
But on the other hand, it also makes Node-RED suffer from operability issues that are even bigger than other flow based programming engines. It makes templating for example an order of magnitude more difficult: protocol transformation and payload normalization need to be part of the node-RED template, together with threshold definitions and branching.
Though a good fit for edge deployments, an off-the-shelf Node-RED instance is not scalable for the cloud. Some vendors provide cloud solutions with sharding implemented on top of node-RED and by externalizing the protocol termination in a separate component. However, when taking such an approach they could as well switch back to the more generic FPB engines.
This is an excerpt from our latest Benchmark Report on Rules Engines used in the IoT domain. You can have a look at a summary of the results or download the full report over here.