← All writing
Data Modeling7 min read · Blog

Why the star schema still wins

Every few years someone declares dimensional modeling dead. Storage is cheap, columnar engines are fast, just dump everything into one wide table and let the warehouse sort it out. It's a tempting pitch, and it's wrong in a specific, instructive way.

The star schema isn't a performance hack left over from the row-store era. It's a way of organizing meaning. That's why it keeps winning.

What a star actually is

A star schema splits your world into two kinds of things:

  • Facts — the events or measurements you count: an order, a payment, a session, a support ticket. Each fact table holds the numbers (the measures) at a specific grain.
  • Dimensions — the context you slice by: customer, product, date, region, channel. These are the nouns that answer "by what?"

A fact in the middle, dimensions radiating out around it — that's the star. "Revenue by region by month" is a fact (revenue) joined to two dimensions (region, date). Almost every analytical question has this shape, which is exactly why the model mirrors it.

Why not one big table?

One-big-table feels simpler — no joins, everything in one place. The cost shows up later:

  • Duplication breeds drift. Customer attributes are copied onto every order row. When a customer's segment changes, you're updating millions of rows, and the ones you miss become silent inconsistencies.
  • The grain gets muddy. When facts and context live together, it's easy to build a table where one row sometimes means an order and sometimes means an order line. Now every SUM is suspect.
  • No reuse. The "customer" concept exists only inside that one table. The next fact — payments, say — re-derives it from scratch, slightly differently.

One big table optimizes for the first query and taxes every one after it.

Why not full normalization?

The opposite extreme — third-normal-form, everything decomposed into the smallest non-redundant pieces — is right for systems that write data (your app's database). It's wrong for systems that read it. A normalized model answers "revenue by region by month" with a six-table join that nobody can read and the optimizer struggles with. You've eliminated redundancy at the cost of legibility, and legibility is the entire point of an analytics model.

The star is the legible middle

Dimensional modeling sits deliberately between the two. Dimensions are denormalized enough to read in one join, but factored out enough to define each concept once and reuse it everywhere. A human can look at a star and understand the business. An optimizer can plan it. An AI agent can reason over it, because the grain is declared and the joins are obvious.

Columnar storage made stars faster — it didn't make them unnecessary. The question a model answers was never just "how do I store this." It's "how does anyone understand this in eighteen months." On that question, the star still wins, and the next piece is about the decision that makes or breaks it: the grain.

Also asListensoonSlidessoonPodcastsoonVideosoon

Have data that should be doing more?

Tell me about the pipeline that breaks, the metric nobody trusts, or the analysis stuck in a notebook. Let's operationalize it.