The "Fix with AI" button accounts for 80% of sessions. That tells you everything you need to know about where AI actually adds value in data workflows.
LinkedIn's data team faced a familiar problem: analysts spending too much time helping colleagues find and query data instead of doing actual analysis. Their solution, SQL Bot, is now used by over 300 weekly active users across the company. But the real story isn't the technology - it's what they learned about where AI genuinely helps versus where it's still a work in progress.
After over a year of development, LinkedIn published detailed findings about SQL Bot's performance. The numbers are refreshingly honest: 53% of responses score as correct or near-correct on internal benchmarks. That's far from the 90%+ accuracy you see in academic benchmarks - but it's also far more useful than those benchmarks suggest.
SQL Bot is built on LangChain and LangGraph, using a multi-agent architecture that routes different types of questions to specialized handlers. But the real innovation isn't the agent framework - it's the knowledge graph that feeds context to those agents.
The system constructs a semantic network connecting:
This knowledge graph is refreshed weekly and organizes information around tables as central nodes, connecting them to columns, documentation, historical queries, and user/product area associations.
When a user asks a question, SQL Bot follows a structured pipeline:
This last stage proved crucial. The query fixer reduced invalid tables and columns from 23% to just 1%, and compilation success improved from 88% to 96%.
LinkedIn's benchmarks on 133 internal questions reveal where the value actually comes from:
| Configuration | Table Recall | Column Recall | Score 4+ (Correct/Near-Correct) |
|---|---|---|---|
| Schemas only | 45% | 24% | 9% |
| Full system | 78% | 56% | 53% |
The jump from 9% to 53% correct responses comes almost entirely from the knowledge graph - not from better prompting or more sophisticated agents. Vercel's d0 proves this at extreme: their YAML semantic layer eliminated 80% of agent complexity. Example queries, table clustering, and semantic attributes provided the largest gains.
Here's where it gets interesting. While only 53% of responses score as technically correct, 95% of users rate the query accuracy as "Passes" or above, with 40% rating it "Very Good" or "Excellent." Netflix's LORE validates this principle: explainability and trust matter more than raw accuracy for enterprise adoption.
The disconnect? Users value the process, not just the output. SQL Bot helps them discover relevant tables, understand schemas, and iterate toward correct queries - even when the first attempt isn't perfect.
The most-used feature isn't query generation at all. The "Fix with AI" button - which appears whenever a query execution fails - accounts for 80% of sessions.
This feature was described by the team as "easy to develop," yet it delivers outsized value. The lesson: identify high-ROI pain points before building ambitious text-to-SQL capabilities. Users who already know some SQL often just need help debugging, not wholesale query generation.
SQL Bot was initially launched as a standalone chatbot. Adoption was modest. Then they integrated it directly into DARWIN, LinkedIn's existing analytics platform.
The result: 5-10x increase in adoption.
The integration included:
This matches a consistent pattern we see: AI tools succeed when embedded into existing workflows, not when they require users to switch contexts.
LinkedIn built three ways for users to improve SQL Bot's performance without involving the platform team:
This self-serve customization proved essential for handling LinkedIn's diverse business verticals without requiring a centralized team to maintain context for every domain.
Uber's QueryGPT tackles the same problem at similar scale - approximately 1.2 million interactive queries monthly. Their reported metrics:
Both systems share key architectural patterns: These productivity gains from Uber's multi-agent approach mirror LinkedIn's experience with different architectural choices.
| Component | LinkedIn SQL Bot | Uber QueryGPT |
|---|---|---|
| Framework | LangGraph + LangChain | Multi-agent with specialized roles |
| Context Management | Knowledge graph + DataHub | Workspaces by business domain |
| Table Selection | LLM re-ranking + clustering | Intent Agent + Table Agent |
| Schema Handling | Column relevance tiers | Column Prune Agent |
| Validation | Query fixer + Researcher Agent | Execution + output validation |
Neither system implements row-level security dynamically within generated queries - a critical consideration for enterprises with strict data governance requirements.
Academic benchmarks like Spider show 90%+ accuracy, but these don't reflect enterprise reality. Spider 2.0, a more realistic benchmark with queries over 100 lines long on tables with 1,000+ columns, shows the best models achieving only 31% execution accuracy.
LinkedIn's 53% on internal benchmarks is actually strong performance given:
The lesson: don't benchmark your enterprise system against Spider. Build internal evaluation sets that reflect your actual complexity.
If you're building an AI data agent, here's what LinkedIn's experience suggests:
The "Fix with AI" feature has 80% session usage with minimal development effort. Users who already know SQL need help with the last mile, not the first.
The jump from 9% to 53% accuracy came from knowledge graph components, not agent sophistication. Quality metadata, usage patterns, and example queries matter more than prompt engineering.
Standalone chatbots see modest adoption. Embedded tools with contextual entry points see 5-10x more usage.
Centralized teams can't maintain domain knowledge for every business vertical. Give users levers to improve performance themselves.
53% technical accuracy with 95% user satisfaction isn't a contradiction. The journey - discovering tables, understanding schemas, iterating on queries - delivers value even when the destination isn't perfect.
Query fixers that catch hallucinations and invalid references are table stakes. Budget for validation agents from the start.
LinkedIn identifies several future directions:
The most interesting thread: shifting from AI-generates-query to AI-helps-you-iterate. The "Fix with AI" success suggests the future of data AI might look less like autonomous agents and more like intelligent copilots embedded in existing tools.
SQL Bot demonstrates that enterprise text-to-SQL is a systems engineering challenge, not a model capability problem. The teams that win will be those who invest in metadata infrastructure, embed AI into existing workflows, and focus on high-ROI pain points before chasing end-to-end automation.
Sources: