From ChatGPT to autonomous AI agents: the real power of LLMs

October 2, 2023

Théo Tortorici

The possibilities of LLMs expand well-beyond generating great copies, stories, essays and programs; it can be framed as a powerful general problem solver.

ChatGPT: The most revolutionary innovation since the internet?

ChatGPT, ChatGPT, ChatGPT… ChatGPT's rise has been nothing short of astonishing. It took a mere 2 months for ChatGPT to amass 100 million users, and experts predict that by 2023, 90% of startups will incorporate LLM features into their products.

The time it took selected popular online services to reach 1M users

And indeed: this technology is captivating both for the general public and innovators. The potential applications seem boundless, and everyone's eager to see where it leads. (Read: the world is still blowing out a new bubble of inflated technological expectations, and it will take some time before it pops)

But what makes ChatGPT so groundbreaking? I believe it's because it's the best example of broad and deep artificial intelligence we have today.

Performance: LLMs, especially newer ones like GPT-4, are astoundingly effective. They can produce relevant and well-structured text, answer queries, craft creative pieces, assist in coding, and do so much more with remarkable precision.

Versatility: While performance showcases AI's prowess (remember the shock when Deep Blue defeated Kasparov?), it's versatility that's truly transformative. LLMs can handle diverse tasks without specialized training, from translations and tutoring to coding. This adaptability is driving their widespread appeal and use.

What does ChatGPT mean for our digital world? Conversational… everything?

LLMs like ChatGPT are changing the way we work and think

These models are reshaping how we produce and understand information. Making content has become almost free, but does this mean we're truly gaining more knowledge (Is ChatGPT solely a statistical parrot?) While it's easier to share information, the big questions remain. For example, even if an LLM knew everything from 1915, would it have thought of Einstein's theory of relativity? Probably not.

LLMs are not just about human language. They're also good at programming tasks. This is great news for the tech world. Making software is becoming cheaper, developers are working faster, and it's easier to start coding. Soon, anyone might be able to "speak" in code! Is natural language the last abstraction layer of programming languages?

ChatGPT isn't just about helping developers. It's changing how software itself works. Old software followed strict rules, with specific buttons and designs. Now, with LLMs, software is becoming more flexible and chat-based.

But there's more. We think the real potential of LLMs isn't just chat. It's something even bigger!

The ground breaking innovation: LLM powered autonomous agents

ChatGPT is one amazing application, but the name ChatGPT hides its full potential! While it suggests a primary use in communication, ChatGPT's true strengths lie in its reasoning and creativity. We've all experienced the limitations of earlier chatbots, which lacked genuine intelligence.

These first-generation chat agents lacked both the breadth and depth of understanding. Their grasp of natural language was basic, and they lacked any real reasoning ability. These early chatbots depended on set databases and rule-based systems, often needing specific prompts to respond and having difficulty with unclear queries. On the other hand, LLM-powered chat agents are trained on extensive internet data, allowing them to produce dynamic, context-sensitive replies. While older chatbots felt robotic and were rigid in their responses, LLM agents provide more organic interactions and can manage a wider array of inputs. Traditional models needed detailed rule-setting and constant updates, but LLM agents are more user-friendly in deployment, although they might need adjustments for specific roles.

The real advancement with LLM-powered agents isn't just their improved chatbot capabilities, but their potential for autonomy. When given the freedom, these agents use LLMs to think through tasks, essentially letting the LLM serve as their brain. Autonomous LLM-powered agents stand out for their ability to independently execute tasks once they understand the goal. They can combine multiple commands or prompts, interact with their surroundings, and even engage with external tools. This is all made possible through their advanced reasoning, memory storage, API integration, continuous learning, and self-reflective capabilities.

Overview of Autonomous Agent Powered by LLM

1. Environment: The agent's operational setting, which could be a digital platform, a real-world robot, a chat interface, a software application, or any other context where it carries out tasks.

2. Sensors: Equip the agent to perceive its surroundings. In a digital context, this might include reading texts, monitoring server activities, or tracking user actions. For physical robots, this might mean using cameras, microphones, or tactile sensors.

3. Actuators: These enable the agent to act within its environment. For instance, a chatbot might send a message, software might execute commands, or a robot might move a part.

4. Reasoning and Decision-Making Using LLMs:

Overview of the reasoning and decision making module of LLM-powered autonomous agent

a) Input Interpretation: The agent employs the LLM to decipher natural language inputs, understanding user instructions, questions, or signals from the environment.

b) Memory: This pertains to retaining and recalling past interactions or data. Crucial for context, it ensures consistent and apt responses. Key memory types include:

Short-term Memory: Retains recent interactions, like parts of a conversation, for continuity but also relevant additional information.

Long-term Memory: Refers to the vast dataset the LLM was trained on, serving as its foundational knowledge base.

Attention Mechanisms: Within transformer architectures, like GPT-3, these mechanisms act like memory, allowing the model to prioritize specific input sections when formulating a response.

c) Task Planning and Execution: After interpreting input and considering its goals, the agent plans its actions. This could involve seeking LLM advice, finding solutions, or gathering more data.

Task Breakdown: The agent divides big tasks into smaller, easier steps, making complex tasks more manageable. (See Chain of Thoughts: CoT; Wei et al. 2022)

Self-reflection and Improvement: The agent reviews its past actions, learns from any errors, and makes necessary changes for better results in the future. (See ReAct; Yao et al. 2023; Shinn and Labash)

d) Tool and Program Integration: For intricate tasks, the agent might use external tools or programs. These could be more apt for certain tasks, or provide information the LLM lacks.

The "Tool Augmented Language Models" (TALM) approach enhances agent performance and adaptability. (See TALM; Parisi et al. 2022)

The "Program-Aided Language Models" (PAL) strategy allows LLMs to convert challenges into code and then employs a Python interpreter for resolution. This strategy surpasses the capabilities of larger models, especially in mathematical and logical tasks. (See PAL: Gao et al. 2022)

e) Output Generation: For communicative tasks, the agent uses the LLM to craft natural language outputs, share information, pose questions, or give feedback.

5. Goals: Pre-set objectives the agent seeks to fulfill. Leveraging LLM abilities, these objectives can be expressed in natural language, leading to more adaptable and varied task outlines.

6. Learning and Adaptation: Though many autonomous agents use diverse machine learning strategies to evolve, an LLM-based agent mainly draws from the LLM's extensive knowledge and adaptability. Nonetheless, it can merge with other learning methods for further refinement.

References

[1] https://lilianweng.github.io/posts/2023-06-23-agent/

[2] https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/

Wei et al. "Chain of thought prompting elicits reasoning in large language models." NeurIPS 2022

Yao et al. "ReAct: Synergizing reasoning and acting in language models." ICLR 2023.

Shinn & Labash. "Reflexion: an autonomous agent with dynamic memory and self-reflection" arXiv preprint arXiv:2303.11366 (2023).

Li et al. "API-Bank: A Benchmark for Tool-Augmented LLMs" arXiv preprint arXiv:2304.08244 (2023).

Joon Sung Park, et al. "Generative Agents: Interactive Simulacra of Human Behavior." arXiv preprint arXiv:2304.03442 (2023).

Related reading:

AutoGPT. https://github.com/Significant-Gravitas/Auto-GPT

Théo Tortorici

Théo is a co-founder of Dot who loves uncovering unexpected patterns in complex datasets. His articles explore how AI and data analysis can reveal surprising truths about the world around us.