The Future of RPA: The LAM Expansion Pack

In the world of technology, the rate at which things progress is pretty much exponential. Whether it’s higher refresh rate AMOLED displays, or smaller, more efficient silicon chips. Its been a while since the field of RPA has seen such an upheaval. Those unaware of RPA might be wondering what it is and how it even matters.

Here’s a brief introduction to what RPA is:

Imagine you spend a large chunk of your day copying and pasting data between different spreadsheets or filling out repetitive forms with the same information. This is exactly the kind of tedious task that Robotic Process Automation (RPA) tackles.

RPA is like having a tireless software assistant that can mimic your actions on the computer. You can “teach” the RPA tool the steps involved in a repetitive task, and then it can automate that process for you. The process is visualized as a workflow in the tool, and each step in the process is called an activity.

An example of a RPA tool (in this case, our own NeuralFlow Studio):

Our RPA tool, NeuralFlow Studio

An example of a workflow sequence, containing activities:

A workflow sequence containing activities

For example, let’s say you process customer orders every day. This might involve copying information from an email into an order processing system, then entering that information into a spreadsheet, application or ERP of choice. An RPA tool could be programmed to:

Open the email containing the order details.
Copy the customer name, product details, and quantity.
Paste the copied information into the appropriate fields of the chosen spreadsheet, web app or ERP of choice.

These are mostly industry-specific tasks, but RPA can be used to perform literally any repetitive task that you want to stop doing.

If RPA is so great, why do we expect any advancements at all?

Well, that’s the cool part!

See, for the better part of 2 decades, we have been able to “record” our workflow through RPA tools. The software would create a workflow that mimicked the exact set of actions you performed, from what software you accessed, what you typed, down to the the time delay between each action. It is an insanely useful way to automate the task of automating your tasks. (Yes, the irony is not lost on me.)

However, the “Record” functionality that these software tout are rigid and inflexible; only capable of executing the tasks in the EXACT way as you did. This would mean that if something changed anywhere in that workflow, whether its a millisecond delay due to a sluggish network connection, or a UI update to the software or website you were accessing; literally anything could introduce issues in the smooth functioning of your automation. While the modern versions of these tools do give us the option to tweak the variables associated with each action, it is still a manual process of trial & error. You may also have to manually add or remove activities from the workflow to compensate for any of the issues.

What could possibly be the solution?

For that, we need to look back to 2021, when the mass audience was introduced to the idea of Large Language Models, or LLMs. The general idea of LLMs is that they are Machine Learning models trained on a huge database of textual content; books, articles, forums, etc. These models are then capable of responding to natural language prompts with detailed responses.

OpenAI ChatGPT (using GPT 4) example:

Google Gemini (previously Bard) example:

Lately, there’s a new kid around the block: Large Action Models, or LAMs. They’re a kind of modification of LLMs. However, while LAMs can also understand natural language prompts, LLMs are limited in their ability to respond only in textual or visual responses, as opposed to taking actions.

LAMs, on the other hand, are specifically trained to be action-oriented, so they will not respond via long paragraphs (“As an AI Language Model…”), but rather will reference its training database that consists of actions, gestures and workflows that are then made sense of by the Model. An LAM’s strength is in it’s ability to execute a sequence of actions defined by the user and the environment the user is in (environment here refers to the application or interface the LAM will have to navigate), but more importantly in the ability to dynamically make changes based on the various variables involved.

In simple words, an LLM is a sayer while an LAM is a doer.

But how are LAMs relevant?

Instead of just explaining, let me use an example.

At the Consumer Electronics Show 2024 (CES 2024), a small startup called Rabbit Inc. launched their first product, the Rabbit R1. I won’t go into details about that device since that’s not the point of this article. However, what I do want to focus on for a bit is how it works.

See, the Rabbit R1 also utilizes an LAM to function, and is currently the only known public implementation of a LAM. Their explanation for how it works is fascinating, so I do recommend giving it a read, or you can watch the video someone shared of their explanation, here.

For those who just want the short answer, this is what they say,

train each workflow once… trains directly on the user interface and adapts…

Now, after talking about how the “Record” functionality in RPA tools is limited and requires a large amount of manual intervention, I think you can see where I’m going with this conversation.

LAM + RPA = Giant Leap

If we were to combine the capabilities of RPA with the adaptability of LAMs, we could set up complex, multi-variable workflows with a single run through. And since LAMs can also harness the abilities of LLMs, we (the consumer) won’t have to actually train it ourselves. We would finally have the virtual assistant we were promised almost a decade ago. I could say,

“Check my Outlook and NeuralStream Gmail Inbox for any important emails. Also, If there are any meeting invites for tomorrow between 10am and 4pm, accept those meeting invites, but make sure that there is at least a half an hour gap between two meetings. In case there are more meetings than the calendar can fit, decline the invites and tell them my calendar was booked.”

and my virtual assistant would do exactly what I asked.

Now, you can already hear the people asking the important question of “What about the people this technology will displace?”. And honestly, this type of assistant will definitely displace a very large number of jobs at this level. But that’s the thing, right? Technology has always progressed in a way to make our lives easier, freeing us from the repetitive, mundane, downright boring tasks and allowing us to embrace the higher echelon of skills and creativity.

Many bad actors exist. Many companies have already replaced humans with the very severely limited LLMs of today. Entire teams of writers laid off because the management thought that LLMs are the way forward. They saved a paisa in the short-term, and will make their own lives incredibly difficult when they realise that the quality of results are only maintained when a human who understands the field properly is in-charge of that field.

Here’s what I think.

What we can look forward to is the democratization of these technologies; just as LLMs became open-sourced, so will LAMs. And when that day comes, we can use these technologies to facilitate our progress while also ensuring that we aren’t expendable.