When discussing the multi-agent concept in the context of Generative AI (GenAI), it’s crucial to distinguish between two different paradigms: multi-agent reinforcement learning (MARL) and multi-agent orchestration, which is the approach we focus on at Waylay.
In MARL, multiple agents learn to make decisions through trial and error within an environment, aiming to maximize cumulative rewards over time. Each agent has several interfaces: an observation space (representing the information it perceives), an action space (the set of possible actions it can take), a reward signal (feedback on its actions), memory (to store past experiences), and a learning algorithm (which updates its policy based on experiences to optimize its performance).
In contrast, our focus at Waylay is on multi-agent orchestration, where a large language model (LLM) decides which agents to activate, when to activate them, what input arguments to use, and how to transfer outputs between agents. The LLM also provides a user-friendly interface to facilitate this orchestration. These decisions are guided by a combination of user intent, agent capabilities, and LLM system prompt, tailoring the solution to specific use cases. For example, in a demo later in this blog, I will dive deeper into this approach.
This distinction is important because it helps clarify whether we are discussing MARL - where agents learn through interaction - or using LLMs for orchestrating agents in a manner similar to robotic process automation (RPA) but with enhanced flexibility and intelligence.
LLM Orchestration: Transforming API Endpoints into Waylay LLM Agents in no time!
To leverage LLM orchestration effectively, the first step is to expose various resources—such as API endpoints, emails, SMS, CRM databases, or any SQL or data queries—as agents for the orchestrator. Essentially, this involves transforming any existing API endpoint, which is traditionally used as a building block for RPA, into an agent interface that can be utilized by LLM orchestrators.
At Waylay, we use sensors to expose any third-party API within our automation platform—the Waylay rules engine. Our sensors act as "typed lambdas," a concept I will explain shortly. This unique approach allows the Waylay rules engine to seamlessly connect logical components in an intuitive way. It also establishes a clear contract specifying the required input for each function and, equally importantly, the output that each function (or API, in this case) generates.
For example, querying a database to find a policy associated with a user might require inputs such as the user's first name, last name, and a unique identifier. The output would be the user's policy, represented as an object containing all relevant policy records. Similarly, sending an email might require a user email address, subject, and message, with other inputs potentially being optional.
Additionally, through binding relationships, we can seamlessly connect the outcome of one API call to another in a declarative manner.
Another advantage of this approach is that every endpoint in our framework can be utilized as either an API or an LLM agent without any additional coding.
How is this possible? Since each API is defined as a sensor, with its description, required inputs, and outputs (data and states), the Waylay framework injects this information into the LLM as function agent descriptors before invoking the LLM. This ensures that the LLM has all the necessary information to utilize the function in real-time, including its capabilities, required inputs, and how to use the output from one agent for reasoning or as input to other agents during execution.
Furthermore, we can combine multiple APIs into a cohesive workflow and expose it as a "super agent function." This super agent clearly defines its required inputs and outputs, just like any other sensor in our system.
Putting it all together
In this example, we have an API call that searches through customer records to find the appropriate coverage policy, including details such as coverage type, benefits, and more. This information is then used in a scenario where, if a car breaks down, the driver can ask a bot for assistance. The bot can provide the location of the nearest service center that can offer the necessary help, along with information about whether the service is covered by the driver's policy. Additionally, a full report of the incident, the policy details, and the location of the nearest service center can be sent to the driver's email address, which is also stored in the database.
Now, let's expose this API alongside others—such as those for sending emails, locating the nearest tow, rental, or repair shop, and accessing the car's location service. Notice that Waylay orchestrator is in this context used as nothing more than an API and web flow wrapper, where the designer of the bot multi-agent interface didn't need to code anything, it only had to provide a LLM system prompt, while all other things are injected directly into the bot.
Finally, the multi-agent interface is represented by the Waylay template, which is exposed as an API. The question input comes from the user interface (in this case, a bot), while the messages input is used to maintain the context of the conversation over time, ensuring a consistent user flow. It is the responsibility of the application using this template to update the messages variable as the conversation progresses.
Finally, here is the demo!
If you're keen on implementing multi-agent applications within your own organization, don't hesitate to reach out to Waylay at sales@waylay.io