What LinkedIn Learned Building an AI Data Agent for 300+ Weekly Users (2024)

December 12, 2024

Rick Radewagen

The "Fix with AI" button accounts for 80% of sessions. That tells you everything you need to know about where AI actually adds value in data workflows.

LinkedIn's data team faced a familiar problem: analysts spending too much time helping colleagues find and query data instead of doing actual analysis. Their solution, SQL Bot, is now used by over 300 weekly active users across the company. But the real story isn't the technology - it's what they learned about where AI genuinely helps versus where it's still a work in progress.

After over a year of development, LinkedIn published detailed findings about SQL Bot's performance. The numbers are refreshingly honest: 53% of responses score as correct or near-correct on internal benchmarks. That's far from the 90%+ accuracy you see in academic benchmarks - but it's also far more useful than those benchmarks suggest.

The Architecture: Multi-Agent Systems Meet Knowledge Graphs

SQL Bot is built on LangChain and LangGraph, using a multi-agent architecture that routes different types of questions to specialized handlers. But the real innovation isn't the agent framework - it's the knowledge graph that feeds context to those agents.

The system constructs a semantic network connecting:

DataHub metadata: Table schemas, field descriptions, partition keys, top values for categorical fields
Query logs: Table/field popularity, common join patterns extracted from successful queries
Domain knowledge: Business context collected directly from users through the UI
Example queries: Certified notebooks from DARWIN (LinkedIn's internal analytics platform) that meet quality heuristics

This knowledge graph is refreshed weekly and organizes information around tables as central nodes, connecting them to columns, documentation, historical queries, and user/product area associations.

The Four-Stage Query Pipeline

When a user asks a question, SQL Bot follows a structured pipeline:

Retrieve Context: Gathers 20 candidate tables using embedding-based retrieval filtered by access popularity
Rank Context: An LLM scores table relevance and narrows to 7 tables, then identifies relevant columns in two tiers
Write Query: Generates SQL with documented assumptions and explanations
Fix Query: Validates syntax, checks for hallucinations, and deploys a "Researcher Agent" with table search tools when errors are detected

This last stage proved crucial. The query fixer reduced invalid tables and columns from 23% to just 1%, and compilation success improved from 88% to 96%.

The Numbers That Matter

LinkedIn's benchmarks on 133 internal questions reveal where the value actually comes from:

Configuration	Table Recall	Column Recall	Score 4+ (Correct/Near-Correct)
Schemas only	45%	24%	9%
Full system	78%	56%	53%

The jump from 9% to 53% correct responses comes almost entirely from the knowledge graph - not from better prompting or more sophisticated agents. Vercel's d0 proves this at extreme: their YAML semantic layer eliminated 80% of agent complexity. Example queries, table clustering, and semantic attributes provided the largest gains.

User Satisfaction vs. Technical Accuracy

Here's where it gets interesting. While only 53% of responses score as technically correct, 95% of users rate the query accuracy as "Passes" or above, with 40% rating it "Very Good" or "Excellent." Netflix's LORE validates this principle: explainability and trust matter more than raw accuracy for enterprise adoption.

The disconnect? Users value the process, not just the output. SQL Bot helps them discover relevant tables, understand schemas, and iterate toward correct queries - even when the first attempt isn't perfect.

The "Fix with AI" Revelation

The most-used feature isn't query generation at all. The "Fix with AI" button - which appears whenever a query execution fails - accounts for 80% of sessions.

This feature was described by the team as "easy to develop," yet it delivers outsized value. The lesson: identify high-ROI pain points before building ambitious text-to-SQL capabilities. Users who already know some SQL often just need help debugging, not wholesale query generation.

Integration Multiplied Adoption by 5-10x

SQL Bot was initially launched as a standalone chatbot. Adoption was modest. Then they integrated it directly into DARWIN, LinkedIn's existing analytics platform.

The result: 5-10x increase in adoption.

The integration included:

Sidebar access within the same browser window where users write queries
A contextual "Fix with AI" button on failed query executions
Persistent chat history for continuing conversations
In-product feedback and custom instructions

This matches a consistent pattern we see: AI tools succeed when embedded into existing workflows, not when they require users to switch contexts.

User Customization: Three Levers

LinkedIn built three ways for users to improve SQL Bot's performance without involving the platform team:

Dataset customization: Users specify relevant email groups or explicit user/dataset mappings
Custom instructions: Free-form text that enriches domain knowledge or guides behavior
Example queries: Users can index certified notebooks from DARWIN as few-shot examples

This self-serve customization proved essential for handling LinkedIn's diverse business verticals without requiring a centralized team to maintain context for every domain.

How LinkedIn Compares to Uber's QueryGPT

Uber's QueryGPT tackles the same problem at similar scale - approximately 1.2 million interactive queries monthly. Their reported metrics:

300 daily active users (vs. LinkedIn's 300+ weekly)
78% of users say generated queries reduce time vs. writing from scratch
70% reduction in query authoring time (10 minutes to 3 minutes)
50% overlap with ground truth tables on internal evaluation

Both systems share key architectural patterns: These productivity gains from Uber's multi-agent approach mirror LinkedIn's experience with different architectural choices.

Component	LinkedIn SQL Bot	Uber QueryGPT
Framework	LangGraph + LangChain	Multi-agent with specialized roles
Context Management	Knowledge graph + DataHub	Workspaces by business domain
Table Selection	LLM re-ranking + clustering	Intent Agent + Table Agent
Schema Handling	Column relevance tiers	Column Prune Agent
Validation	Query fixer + Researcher Agent	Execution + output validation

Neither system implements row-level security dynamically within generated queries - a critical consideration for enterprises with strict data governance requirements.

The Enterprise Benchmark Gap

Academic benchmarks like Spider show 90%+ accuracy, but these don't reflect enterprise reality. Spider 2.0, a more realistic benchmark with queries over 100 lines long on tables with 1,000+ columns, shows the best models achieving only 31% execution accuracy.

LinkedIn's 53% on internal benchmarks is actually strong performance given:

A data lake with millions of tables
Popular tables with 100+ columns
Regular table deprecation and overlapping information
Company-wide scope across diverse business verticals

The lesson: don't benchmark your enterprise system against Spider. Build internal evaluation sets that reflect your actual complexity.

Practical Takeaways for Data Teams

If you're building an AI data agent, here's what LinkedIn's experience suggests:

1. Start with query debugging, not generation

The "Fix with AI" feature has 80% session usage with minimal development effort. Users who already know SQL need help with the last mile, not the first.

2. Invest in metadata before agents

The jump from 9% to 53% accuracy came from knowledge graph components, not agent sophistication. Quality metadata, usage patterns, and example queries matter more than prompt engineering.

3. Integrate into existing workflows

Standalone chatbots see modest adoption. Embedded tools with contextual entry points see 5-10x more usage.

4. Build for self-serve customization

Centralized teams can't maintain domain knowledge for every business vertical. Give users levers to improve performance themselves.

5. Accept imperfect accuracy if the experience is valuable

53% technical accuracy with 95% user satisfaction isn't a contradiction. The journey - discovering tables, understanding schemas, iterating on queries - delivers value even when the destination isn't perfect.

6. Plan for validation and self-correction

Query fixers that catch hallucinations and invalid references are table stakes. Budget for validation agents from the start.

What's Next

LinkedIn identifies several future directions:

Faster response times (currently under 60 seconds)
In-line query revisions
Exposing the context SQL Bot used for transparency
Learning from user interactions over time
Identifying champions to lead self-serve context curation

The most interesting thread: shifting from AI-generates-query to AI-helps-you-iterate. The "Fix with AI" success suggests the future of data AI might look less like autonomous agents and more like intelligent copilots embedded in existing tools.

SQL Bot demonstrates that enterprise text-to-SQL is a systems engineering challenge, not a model capability problem. The teams that win will be those who invest in metadata infrastructure, embed AI into existing workflows, and focus on high-ROI pain points before chasing end-to-end automation.

Sources:

Rick Radewagen

Rick is a co-founder of Dot, on a mission to make data accessible to everyone. When he's not building AI-powered analytics, you'll find him obsessing over well-arranged pixels and surprising himself by learning new languages.