In this blog post, I will define the three key requirements that every rule engine must satisfy in order to fully unleash the power of serverless computing.
I will also show how functional programming (FP) together with flow processing engines have given developers a restrictive tunnel vision when it comes to creating a good automation rule engine.
In the following blog post I will cover in more detail serverless automation with Waylay engine.
Lambda calculus
Every time developers chain functions, in any language they may do it, they actually use function composition (follow this link in case you like mathematics or this link in case you are a developer).
For instance, the functions f : X → Y and g : Y → Z can be composed to yield a function which maps x in X to g(f(x)) in Z. Intuitively, if z is a function of y, and y is a function of x, then z is a function of x. The resulting composite function is denoted g ∘ f : X → Z, defined by (g ∘ f )(x) = g(f(x)) for all x in X.
Other things you may hear on the subject refer to monoids, transformations, group theory or lambda calculus, and these are all basically properties, or rather results of functions chaining. Functional programming comes with a very robust application style that in many cases works like a charm. Stream processing is one of the examples where this approach particularly shines, due to capability to split the work, transform and merge results to the extreme.
How to build a perfect rule engine?
So if Spark and similar applications (flow engines) work great, what is then the problem with the function chaining approach when trying to build a rule engine?
Before answering, let me explain what are the three requirements that a good rule engine needs to satisfy:
- A rule engine should allow an independent information flow. The order in which information is gathered – whether information is provided to the rule engine as an event or requested via the external call. For instance, as we walk down the road, we might observe different things using our senses, and this information flow is not guided by any particular control function.
- A rule engine must have an independent control flow. In some cases, you might be interested in a certain type of information, only if some other type of information is presented to you. Either way, this requirement is orthogonal to the first one, and we should not confuse one with the other. In computer science, control flow is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated.
- Decisions (rules) should be decoupled from both information flow and control flow – that is to say, decisions should be invoked as soon as new information is available. The process by which a conclusion is inferred from multiple observations is called inductive reasoning. In the field of Artificial Intelligence, the inference engine is a component of the system that applies logical rules to the knowledge base to deduce new information.
Orchestration based on serverless
So let’s first see how one typical flow engine behaves when having these requirements in mind.
If we use lambda calculus notation, this is how one particular rule would look like:
Which is, not surprisingly, similar to the AWS lambda picture showing the IoT integration using lambdas.
Now, let’s try a scenario that is little bit more complicated, with 3 input variables (x,y,z) where x and y are input variables that come as events, and z is an input variable that is the result of the polling of a REST API endpoint:
The first thing we can note is that branching (decisions) is making this graph to grow exponentially, but that is the problem with decision tables, as we discussed before. On top of this, we see how lambda “flow”-based calculus is getting us in all sorts of trouble. The arrows in this picture stand not only for information flow, but at the same time they represent the control and decision flow.
Merging data from different streams at different time slots becomes a daunting task. The main reason for this complexity is that information flow, control flow and decisions are all coupled together!
Funny observation: every time we give a training to developers on the waylay rule engine, the first thing we always need to do is try to have them unlearn thinking of a rule as a left –> right flow of information, with control and decisions bundled together!
(IBM) OpenWhisk – Composites
To illustrate this approach in OpenWhisk, there is no better way than to just look inside the code, where decision, data inputs and controls are all composed at once:
That might work fine for certain types of applications, but in general, as discussed earlier, this is getting us in all sort of problems.
AWS Step functions
In order to solve drawbacks of function chaining, AWS has come up with step functions.
As we can see, the lambda function is wrapped in a “step” entity, and that way we can at least have a better understanding of what is going on by combining them together. Moreover, we can easily merge information from different sources, which doesn’t come as a surprise, as this is what BPM engines are made for. But how do we add data streams here? And how can we “inject” new information at any moment in time? The answer is that we can’t.
Why is that? The main reason is still that information flow, control flow and decisions are all coupled together, but this time in a different way than with flows!
That is why neither flow engines nor BPM engines can solve the problem of merging the IoT world of continuous real-time sensor data with the API world of cloud and enterprise software data. Even more, none of the existing rule engines out there today can satisfy the principle that information flow, control flow and final decisions are independent of each other.
Azure Functions
Yochay Kiriaty (@yochayk) in this presentation gives a great overview of common serverless patterns (Function Chaining, Function Chaining with Rollback (transaction), ASync HTTP (HTTP 202), Fanout (Parallel), Fanout + Fan-in …). I would highly recommend it to everyone who is intereseted in serverless patterns to go over these slides, if nothing else, to fully understand the complexity of building automations using serverless.
Update (November 2020): This presentation, in the extended form was presented at Serverless Architecture Conference in Berlin 2020: “Solving the weak spots of serverless with Directed Acyclic Graph Model”
Should we use a rule engine?
Some of you might recognise this title as having seen it before in this great blog post by Martin Fowler:
“An important property of rule engines is chaining – where the action part of one rule changes the state of the system in such a way that it alters the value of the condition part of other rules. Chaining sounds appealing, since it supports more complex behaviors, but can easily end up being very hard to reason about and debug.”
And a little bit further ….
“I’ve run into a few cases where people have made use of rules engine products, and each time things don’t seem to have worked out well (disclaimer: I’m not a statistically valid sample). Often the central pitch for a rules engine is that it will allow the business people to specify the rules themselves, so they can build the rules without involving programmers. As so often, this can sound plausible but rarely works out in practice.”
Once we see that none of the approaches work, we can always go back to this:
Which is to say, “I give up on rule engines! Just let me do things my way”, which is exactly what we see happening more and more. One typical blog post we can find here:
“Functional facades” – where you build applications calling either local all remote functions (I think I’ve heard these things before 😃). So, in the end, that would actually mean moving a local monolith to a remote monolith… Push your application to heroku and scale up the application by calling remote lambda functions. Async and promise to the rescue of course, and don’t get me wrong, this can work, but in this blog I argue for building applications using a cloud rule engine – but the right one! So let’s get back to the topic.
Waylay rule engine as seen from the perspective of serverless
The Waylay engine was built from day one with the idea of separating information, control and decision flow, using the smart agent concept: where sensors, logic and actuators are separate entities of the rule engine.
Waylay lambda functions have been defined as either sensors or actuators. Sensors are “typed” lambda functions, which return back a state, data or both.
The Waylay rule engine is an inference engine, and any time sensors come back with information (in the form of both sensor data and sensor state), these results infer back in the engine and fire actuators (other lambda functions) if conditions are satisfied. A condition can be a particular state of a sensor, a state transition, or a combination of states of many sensors put together. Please check this blog post, which shows how powerful this rule expression is, compared to decision trees.
Before any sensor or actuator is invoked, the engine makes a copy of the rule context, providing, if required, results and data from all sensors executed at that moment in time to the calling function.
Sensors can be invoked:
- via polling frequency, cron expression or one time
- as the result of other function calls (sensors)
- as the outcome of multiple function executions (via logical gates)
- on new data arriving
And of course, with all different conditions combined if needed!
The rule designer can even define for how long should each of the sensor’s “claims” be valid. What this means is that if the polling fails for a longer period when checking a particular web site or API endpoint, a sensor might say to the engine: “I don’t know any more!”. That is also an elegant way of merging different event streams where information is valid only for a short period of time, which is a very important aspect to take into consideration when making decisions and it is explained here.
Waylay engine, a new era of building cloud applications
In the picture below, we can see the ay platform, where end users can build rules that combine all 3 principles described before:
*“Check if the room temperature is above 27, while at the same time the outside temperature is 4 degrees less. If that is the case, we assume it is a fire and send a notification. Also, if the NEST smoke detector detects a fire, we send the notification.
Features shown here: 1. conditional sensor execution (weather sensor only executes if the office temperature is above a threshold). 2. Combo gate used for multiple choice (either it is a NEST smoke detector or it is a temperature condition)”*
To find more about the Waylay engine and internals, go to our documentation page or read the following blogs on the same subject:
- Serverless automation with Waylay engine
- The Waylay engine, Part 1: One rules engine to rule them all
- The Waylay engine, Part 2: Bayesian inference-based programming using smart agents
- The curse of dimensionality in decision trees – the branching problem
- Rule patterns
- Creating applications with cloud functions – how to manage rules and orchestration in serverless architectures
- AI and IoT, Part 1: Challenges of applying Artificial Intelligence in IoT using Deep Learning
- AI and IoT, Part 2: Deep Learning and Bayesian Modelling, building the automation of the future
- AI and IoT, Part 3: How to apply AI techniques to IoT solutions – a smart care example