Background

The distinction between Large Language Models (LLMs) and specialized deep networks lies in their design, purpose, and the nature of their training data and architectures. The training of LLMs focuses on textual data, with adaptations and extensions that are required for multimodal capabilities. In contrast, specialized models (also with deep neural nets of a similar architecture) are trained with rich, annotated datasets specifically curated for a particular task, while feature extraction and domain space go way beyond "words encoding", hence it is not so surprising to see this picture below - that LLM based prediction/forecast or decision intelligence score low in that regard.

If we look at this table, there is nothing strange to observe in the context of GenAI capabilities, it is more of managing expectations to which extend general LLM models are as good as more specialized "crafted" models in a singular domain or as a general reasoning engine (for both of which they score low - but for different reasons).

On the forecast /predictions, it is simply the fact that better architectures with better input datasets and better features extractions (not text only) perform better, while for the decision/reasoning in LLM, it is getting obvious that without causal understanding these models will be limited (there are smarter people than us getting to the same conclusion).

On the "hallucination" problem.

When we talk about hallucination in LLM, we are often mixing different things. First one is the error of "reasoning" as a general drawback of the GenAI reasoning process. Second is the fact that LLM is designed on purpose to "hallucinate" otherwise would be always giving the same answer to the same question.  Both of which, for a given domain space, are not great. When others are saying that they are trying to solve the hallucination issues, they either talk about fixing LLM with some sort of causal reasoning (one of many papers), or about better RAGs encodings, or fine tuning,  or about "adjusting the hallucination factor - the temperature" on the existing LLM.

How does Waylay solve this problem?

Before we answer, let's see how Waylay is used for Industrials. In this context, the waylay platform is often deployed for asset monitoring and after sales support - what can be described as troubleshooting processes. To draw an analogy, it is similar to how a doctor diagnoses and treats a patient. Both scenarios involve identifying symptoms, diagnosing the issue, and implementing a solution.

Since we know that GenAI is not good at diagnosing the problem (see the table above), we are not using it for diagnostics. For that we use Waylay causal rules engine that enable experts to model issues based on data, ML "classical" models and expert heuristics.

For instance, let's consider if we want to find the HVAC air filter that is clogged or dirty. For that, we know that dirty filters will obstruct airflow and reduce efficiency, so the central air conditioning system is not cooling effectively. Experts would set a rule to monitor for conditions where the temperature is higher than the set point, where the airflow seems weaker with increased static pressure in ductwork that all result in increased energy usage. So, in Waylay we will model exactly these conditions and eventually fire the alarm about the HVAC filter that needs replacement.

So where is GenAI here so far? Nowhere - yet. Still we did something in parallel to this: we trained LLM models to "understand" Waylay rules engine (based on the training set that included rules, machine data and outcomes).

In that sense, "Waylay fine tuned LLM" is able to interpret the desired intent of an expert that modeled any problem. That way we have a clear, error free interpretation of the diagnosis. The reason that it works is simple - we don't use GenAI to find a problem, only to explain in clear text what has been found - it is like having an expert whispering in the ear of a support engineer what is the issue for a given asset.

After this phase, we use GenAI to its full potential, and that is about knowledge retrieval. In the context of industrials - it is about the repair process.

So you should think of it as a three stage process: a) Troubleshooting use cases modeled by experts in Waylay in an easy way, b) when alarm hits the CRM system, we get explanations what the issue is (thanks to GenAI interpreting the data, rule and outcome), and c) Remedies are searched via combination of LLM and RAG, which are now well-established examples of where using GenAI makes most sense. A side note: in this context, some "hallucinations" can actually be helpful (as tested with a customer) - see this blog post.

Conclusion

Waylay is not solving the GenAI hallucination problem, nor is trying to improve existing LLM reasoning architecture. Rather we are in a smart way combining the power of causal modeling (which is explainable) with power of LLM to interpret these models correctly, and then to mix it with awesome power of GenAI in the context of knowledge summaries and search - which is our context used for finding the right repair manuals - all of which result in this incredible ROI.