How We’re Scaling Our Chatbot Using an AI Agent Framework

As chatbots evolve to meet increasingly complex user demands, scaling isn’t just about adding more servers—it’s about rethinking how your system works under the hood. We recently overhauled our chatbot by moving to an AI agent-based framework, and it’s already paying dividends in speed, accuracy, and flexibility.

Here’s a behind-the-scenes look at what prompted the change, how our new architecture works, and what we’ve gained from the switch.

The Challenge: A Rigid, Slowing System

Before the redesign, our chatbot was hitting serious performance and scalability limits. Responses slowed down as API calls stacked up sequentially. Every new integration made the system more brittle. Even something as routine as tracking multi-turn conversations across chat history started to feel like a patchwork of quick fixes.

The core issues boiled down to:

  • Performance bottlenecks due to sequential API processing
  • Scalability challenges as we integrated more data sources
  • Rigid architecture that made new features hard to implement
  • Weak context handling, especially across longer conversations

We needed something more modular. More dynamic. More... agentic.

The Shift: Introducing the Agent Framework

Enter LangGraph, a graph-based orchestration framework designed for multi-agent systems. Instead of running everything through a rigid pipeline, LangGraph allowed us to build a flexible network of agents—each with its own role, able to operate in parallel and evolve context as the user conversation unfolds.

Now, instead of a single linear path, our chatbot behaves like a smart task force. Agents collaborate, specialize, and share information in real time, producing fast and deeply contextual responses.

User Prompt

    |

    v

Query Modifier Agent

    |

    v

Data Source Decider Agent

    |

    v

Supervisor Agent

   /     |     \

Node 1 Node 2 Node 3   ← (e.g., Token Data, Stock Info, Wallet Queries)

   \     |     /

    v    v    v

Generate Response Agent

Building and Distributing Context Across Agents

The magic of our new system lies in how it builds and distributes context. Every conversation begins with several layers of input:

  • User Prompt: The initial message from the user

  • Chat History: Helps interpret follow-up questions accurately

  • Admin Context: Predefined keyword files that help guide searches

  • Azure + Google Search: For internal knowledge and public data

  • Live API Data: For dynamic queries like token prices or stock stats

A Supervisor Agent assesses the user’s intent and decides what’s needed. It dynamically spawns domain-specific nodes—like one for tokens, another for wallets or stocks—and these nodes fetch and return relevant data, all while working in parallel.

Each agent contributes its slice of insight, and by the time we generate a final response, the context has been enriched and stitched together like a well-briefed report.

Here’s how the context evolves using a real-world example:

Prompt: "What are the latest top 5 crypto narratives?"

Query Modifier Agent: Updates the prompt to:
"What are the top 5 trending cryptocurrency narratives from the past week?"

Data Source Decider Agent: Selects real-time APIs like DefiLlama to fetch current market data

Node Executors: Fetch and return narrative data

Merged Context Example (pseudocode):

{

  "intent": "top_crypto_narratives",

  "refined_prompt": "What are the top 5 trending cryptocurrency narratives from the past week?",

  "sources": ["defillama", "google"],

  "results": {

    "DePIN": 3.66,

    "SocialFi": 0.52,

    "NFT Marketplace": 2.29,

    ...

  },

  "final_narratives": ["SocialFi", "DeFi", "AI", "GameFi", "NFT Marketplace"]

}

Final Response Agent: Builds the human-readable output using all context layers: "The top narratives this week are SocialFi, DeFi, AI, GameFi, and NFT Marketplace..."

Roles and Responsibilities: Inside the Agent Team

Each agent in the system has a specific job to do—and they all contribute to building the shared context object.

  • Query Modifier Agent: Refines user input using chat history for more precise understanding and query formulation.
  • Data Source Decider: Chooses which APIs or sources (Azure, Google, etc.) are relevant to the query.
  • Supervisor Agent: Dynamically spawns nodes for each domain (e.g., tokens, stocks, wallets) and runs them in parallel.
  • Node Executors: Execute domain-specific tasks (call APIs, process data). Example: Token Node, Wallet Node.
  • Generate Response Agent: Combines context and outputs the final response tailored to the user’s intent.

Real Results: What’s Improved Since the Switch

The benefits of moving to this architecture have been immediate and tangible:

Faster Responses: Parallel processing slashed latency—no more waiting for one API to finish before starting the next
More Accurate Answers: Queries are now better structured, and only the most relevant sources are tapped
Resilient Error Handling: If a node fails (say, an API times out), others still contribute, and fallback logic keeps the conversation going
Easy to Scale: Want to add a new domain or capability? Just plug in a new agent or node—no need to rebuild the system

Performance: Before vs. After

One of the most tangible wins: speed.

With Multi-Agent: Average response time: 21.16 seconds

Without Multi-Agent: Average response time: 29.25 seconds

In the new setup, API calls run in parallel, and the system activates only the agents needed for the query—saving time and resources.

Even in complex queries where multiple nodes are required, we're still seeing consistently better response times than in the old single-threaded flow.

The Tech Stack: What Powers It All

Here’s what’s running under the hood:

  • LangGraph: Core of our multi-agent orchestration
  • FastAPI: Lightweight async server
  • asyncio / concurrent.futures: Non-blocking, real-time processing
  • GPT-4o-mini: Our go-to LLM for prompt handling and generation

We explored other frameworks like AutoGen, CrewAI, and LangChain—but LangGraph hit the sweet spot for granular control, parallelism, and real-time context sharing.