← All writing
AI & AI Agents8 min read · Blog

The semantic layer is the missing piece for trustworthy AI analysis

If you've watched an AI agent write SQL against a production warehouse, you've seen the failure mode: it's fluent, not correct. It joins orders to users on the wrong key. It sums amount without noticing half the rows are refunds. It picks created_at when the business runs on closed_at. Every answer arrives with the same calm confidence — including the wrong ones.

This isn't a model problem you fix with a bigger model. It's a grounding problem, and the fix is a semantic layer.

Why raw schemas break LLMs

A warehouse schema encodes almost none of what a human analyst knows. The table is called fct_txn; the analyst knows it's deduplicated nightly, that status = 'C' means complete, that revenue excludes tax and refunds, and that "customer" means the billing account, not the user. None of that lives in the DDL. The agent can't infer it — so it guesses, plausibly.

Give a capable model a clean, well-described model and the same agent gets sharp. Give it raw tables and column names from 2019, and you've built a very fast way to be wrong.

What the semantic layer actually provides

A semantic layer sits between the physical tables and anyone — human or agent — asking questions. It defines, in one governed place:

  • Entities and grain — what a "customer," "order," or "session" is, and the level each table lives at.
  • Metricsnet_revenue defined once, with its filters and exclusions, so it can't be re-derived three different ways.
  • Relationships — the join paths that are valid, so the agent can't invent one.
  • Descriptions — the human context ("excludes internal test accounts") that the schema omits.

For an agent, that's not documentation — it's the action space. It stops the model from writing arbitrary SQL and lets it compose trusted, pre-defined building blocks. The agent reasons about which metric and which dimensions; it no longer gets to reinvent what "revenue" means at 2am.

Grounding beats cleverness

This is the through-line of everything I build: the model is downstream of the modeling. A mediocre LLM on a great semantic layer will out-analyze a frontier model on a pile of raw tables, every time — because the hard part of analysis was never the SQL syntax. It was knowing which question maps to which trustworthy number.

The semantic layer is how you encode that knowledge once and let everything — dashboards, notebooks, and now agents — inherit it. It's also the highest-ROI "AI project" most teams could run, and it has nothing to do with AI. It's data modeling. The agents just made it urgent.

Next in the series: how to actually expose that model to an agent so it can navigate it.

Also asListensoonSlidessoonPodcastsoonVideosoon

Have data that should be doing more?

Tell me about the pipeline that breaks, the metric nobody trusts, or the analysis stuck in a notebook. Let's operationalize it.