All Posts

Salesforce's Horizon Agent: Text-to-SQL in Slack

May 27, 2025byRick Radewagen
Their engineers were drowning in support requests. So they built an AI that says "I don't know" better than most humans.
Salesforce Horizon Agent

Salesforce published a deep dive into Horizon Agent in May 2025, their internal text-to-SQL tool that went from early access in August 2024 to general availability in January 2025. It's another entry in the growing library of "how FAANG built their AI data assistant" posts, joining Uber's QueryGPT, LinkedIn's SQL Bot, and Netflix's LORE.

But Horizon Agent's story stands out for one reason: they're unusually honest about what didn't work.

The Problem

At Salesforce, data scientists and engineers had become accidental gatekeepers. Non-technical teams needed answers, but getting those answers required SQL. The result was predictable:

  • Support request backlogs growing faster than they could be addressed
  • Decisions made with stale data (or worse, gut instinct)
  • Engineers spending "dozens of hours per week" writing custom queries instead of building features

Sound familiar? This is the data access gap that every growing organization eventually faces. BI dashboards help, but they never cover every question a user might have.

Their goal: let anyone ask questions in plain English and get answers instantly, right in Slack where they already work.

The Architecture

Horizon Agent combines several internal Salesforce technologies:

  • Slack Bolt (Python framework) for the conversational interface
  • Fack — an open-source tool Salesforce created for storing business context, terminology, and SQL construction guidelines
  • Horizon Data Platform — their internal metadata layer (similar to dbt), documenting table purposes and sample queries
  • Einstein Gateway — Salesforce's internal LLM access platform

When a user asks a question, the system:

  1. Identifies the question type using an LLM
  2. Retrieves relevant business context from Fack and dataset information from HDP
  3. Enriches the user's question with all this context (classic RAG pattern)
  4. Submits to an LLM through Einstein Gateway
  5. Returns both the SQL query and an explanation of what it does

The explanation step is crucial—we'll come back to that.

The 50% Problem

Here's where Salesforce gets refreshingly honest:

"At launch we only had the correct response ~50% of the time."

Half their queries were wrong. Most companies would hide this number. But Salesforce shares it because the story of how they improved matters more than the starting point.

Their first mistake was opacity. Early versions would often just say "I don't know how to answer that" without explanation. Conversation over. Users had no idea why it failed or how to ask better questions.

The fix was counterintuitive: be more transparent about the messy middle.

Teaching the Agent to Ask

The breakthrough came when they "loosened their guardrails." Instead of failing silently, Horizon Agent started:

  • Asking clarifying questions when queries were ambiguous
  • Explaining the SQL it generated, even to non-technical users
  • Showing its work so users could spot errors

This transparency had an unexpected side effect: users got better at asking questions. By seeing how the agent interpreted their requests, they learned what level of specificity worked.

First-shot acceptance rate climbed from around 20% to above 40%. More importantly, overall efficacy improved from 50% at launch to 80% today. The system now averages just one error per day across all users, tracked via Grafana dashboards that monitor query success rates, response times, and user feedback in real-time.

The Consensus Trick

Another technical detail worth noting: Horizon Agent doesn't trust a single LLM response.

"We switched from giving the LLM one chance to generate a SQL query to 10 chances."

They generate 10 candidate queries, then use cosine similarity and Levenshtein distance algorithms to eliminate outliers and select the response representing majority consensus. They also pre-validate every query by running EXPLAIN, feeding errors back to the agent for another attempt.

This is expensive in compute but dramatically improves reliability. It's the same pattern we've seen from other enterprise teams: when accuracy matters more than speed, ensemble methods pay dividends.

Where It Lives: Meet Users in Their Workflow

One detail from the Salesforce post resonates with every AI data project we've seen:

"An early prototype of Horizon Agent was a local-only experience using Streamlit. It was a great start, but since it wasn't accessible where our users spend their time (Slack), it didn't get adopted."

They built something that worked. Nobody used it. Then they shipped an MVP to Slack—with worse responses—and adoption took off.

This is the lesson that technical teams consistently underestimate: a mediocre tool in the right place beats a perfect tool in the wrong place. When Airbnb built their AI platform, they integrated it into existing support workflows. When Uber built QueryGPT, they embedded it in their internal web platform where analysts already worked.

The pattern is clear: meet users where they are, not where you wish they were.

Solve for Agility, Not Perfection

Salesforce's most actionable insight is about knowledge base updates:

"If the Agent is confused by a user's question we can have its knowledge base updated in ~15 minutes with automated regression testing."

Fifteen minutes from confusion to fix, with automated tests ensuring the change doesn't break other queries. This is the kind of operational infrastructure that separates prototypes from production systems.

Language evolves. New acronyms emerge. Business terms shift meaning. A static knowledge base becomes stale within months. The ability to update quickly—and safely—is what makes an AI assistant trustworthy over time.

The Bottom Line

Salesforce's Horizon Agent joins a growing list of internal AI data tools at major tech companies. The consistent themes across all of them:

1. Start with transparency. Explain what the AI is doing. Users who understand the system become better at using it. 2. Deploy where work happens. Slack, Teams, existing platforms. Don't make users switch contexts. 3. Build for iteration speed. The first version will be wrong. What matters is how fast you can improve. 4. Measure obsessively. Salesforce tracks first-shot acceptance rates. Without metrics, you're flying blind.

The future of data access isn't about building perfect AI. It's about building AI that learns from its mistakes—and helps users learn alongside it.

Related reading: If this excites you, we'd love to hear from you. Get in touch.

Rick Radewagen

Rick is a co-founder of Dot, on a mission to make data accessible to everyone. When he's not building AI-powered analytics, you'll find him obsessing over well-arranged pixels and surprising himself by learning new languages.