← All writing
AI & AI Agents9 min read · Blog

From question to query to action: building an agentic analytics pipeline

The demos make agentic analytics look like one heroic prompt: ask a question, get an answer. In production it's the opposite — the reliability comes from breaking the magic into stages you can inspect, test, and fall back on. It looks, fittingly, like a pipeline: raw question in, trustworthy action out.

The stages

1. Understand. Parse the question into a structured intent: which entities, which metric, which grain, which filters and time range. If it's ambiguous, the agent's first move is to ask, not to guess. ("By 'active,' do you mean logged in, or transacted?")

2. Retrieve. Pull only the relevant slice of the semantic layer — the entities, metrics, and valid join paths for this question. This is the grounding step; everything downstream inherits its quality.

3. Plan. Compose a query spec from sanctioned building blocks — a metric, its dimensions, its filters — rather than free-form SQL. The plan is reviewable and, crucially, explainable back to the user in plain language before anything runs.

4. Execute & validate. Compile the spec, run it against a read replica, and check the result against cheap sanity rules: row counts in range, no impossible nulls, totals that reconcile to a known control. A result that fails validation goes back a stage — it doesn't get returned.

5. Answer. Return the number, the chart, and the receipts: the definitions used and the query run. An answer you can't audit isn't an answer; it's a rumor.

6. Act (optional, gated). The highest-value step and the most dangerous: turning an insight into a write — updating a segment, opening a ticket, kicking off a workflow. This stage is always behind a guardrail, and usually behind a human. (That's its own piece.)

Why staging beats one big prompt

  • Observability. When an answer is wrong, you can see which stage failed — bad retrieval, bad plan, bad data — instead of shrugging at a black box.
  • Testability. Each stage has inputs and outputs you can write tests against. You can regression-test "what was revenue last quarter" the way you'd test any pipeline.
  • Fallbacks. A low-confidence plan can route to a human; a failed validation can retry or escalate. The system degrades gracefully instead of confidently lying.
  • Cost control. You only spend tokens (and warehouse compute) on stages that earn it.

The mindset shift

If you come from data engineering, none of this is new — it's orchestration, contracts, and validation applied to a probabilistic worker. That's the secret: treat the agent as one more unreliable upstream source, and wrap it in the same discipline you'd wrap any pipeline. The "AI" part is a stage. The engineering is the product.

Last in the series: the guardrails that make stage six — letting an agent act on production data — something you can actually sleep through.

Also asListensoonSlidessoonPodcastsoonVideosoon

Have data that should be doing more?

Tell me about the pipeline that breaks, the metric nobody trusts, or the analysis stuck in a notebook. Let's operationalize it.