All Posts

Your Data Model Is the Prompt

byRick Radewagen3 min read

I've seen teams spend three weeks engineering the system prompt for their AI analytics rollout, and zero minutes renaming the column the agent misreads every day.

What does an agent actually read when it answers a question? Table names, column names, descriptions, declared relationships, a few sample values. That is the context window. Your schema is the prompt. The few hundred tokens of instructions you wrote on top are real, but they're a rounding error next to the thousands of tokens of metadata the model reads on every single question.

Which means data model quality and answer quality are the same variable. We all accepted long ago that data quality determines analytics quality. The same is now true one level up. Context quality determines what an agent can do: the names, the grain, the documentation, the relationships. A clean model with honest names beats any amount of prompt cleverness.

The new hire with no watercooler

The failures are rarely exotic. They look like this:

  • dim_customer_v2_final and dim_customer_v3 both live in prod, and only folklore knows which one is real. (It's v2_final.)
  • Three revenue columns, no hint which one is canonical.
  • status = 4, documented nowhere.
  • Soft-deleted rows that every veteran filters out by habit and no newcomer knows about.

A human analyst routes around all of this, because they absorbed the folklore over months of Slack threads and standups. An agent gets none of that. It's the sharpest new hire you've ever had, with no access to the watercooler. It reads what's written down, takes it literally, and writes confident SQL on top of it.

A useful test for your warehouse: could a smart analyst with no Slack access and nobody to ask answer your top twenty business questions correctly, using only what's documented? That's the agent's exact situation. Most warehouses fail the test. Not because the data is bad, but because the meaning never made it out of people's heads. OpenAI's data platform team reached a similar conclusion: schemas describe shape, but meaning lives in code and context, and you have to surface it deliberately.

Model for the reader

This changes what good modeling means. We spent a decade optimizing models for writers: DRY macros, deep staging layers, clever abstractions. That was fine when the readers were a handful of experts who could decode them. Now the heaviest reader of your warehouse is a machine that takes everything at face value, thousands of times a day.

So optimize for the reader. Explicit beats clever. Fewer, wider, well-named marts beat an elegant maze. Whether you build them with dbt, plain SQL, or something newer matters far less than whether the result reads clearly to someone, or something, encountering it cold. Vercel's d0 team showed how far this goes: their agent succeeds with two generic tools, because the semantic layer underneath is rich enough to navigate with grep.

The cheapest this work has ever been

The good news: the same models that read your schema can help fix it. Drafting the missing descriptions, flagging inconsistent names, finding the column nobody can explain. What used to be a documentation backlog nobody touched is now mostly review work. And it pays twice, because the agent gets accurate and the humans get a warehouse they can finally navigate without asking around.

Before you tune the prompt, read your schema the way the model does: cold, literal, no folklore. Fix what you find there first. That's the prompt engineering that matters.

If this excites you, we'd love to hear from you. Get in touch.

Rick Radewagen

Rick is a co-founder of Dot, on a mission to make data accessible to everyone. When he's not building AI-powered analytics, you'll find him obsessing over well-arranged pixels and surprising himself by learning new languages.