← All writing
Star Schema7 min read · Blog

Five ways a star schema goes wrong

A model can have facts in the middle and dimensions around the edges and still be quietly broken. "It's a star schema" isn't a guarantee of anything — the shape is necessary, not sufficient. Here are the five failure modes I run into most, each one a thing that passes a glance and fails under real use.

1. Mixed grain

The cardinal sin. A fact table where one row usually means one thing but sometimes means another — order-level rows with the occasional line-level refund mixed in. Every SUM over a mixed-grain table is suspect, because you're adding up rows that don't represent the same kind of thing. The fix is discipline: one grain per fact table, stated in one sentence, no exceptions. If you need two grains, build two tables. (I've made the full case for grain elsewhere — it's the single highest-leverage habit in modeling.)

2. Snowflaking too far

Snowflaking is normalizing a dimension — breaking "product" into product → subcategory → category → department, each its own table. It feels tidy. It makes the model worse: now a simple "sales by category" needs a four-table chain, queries get unreadable, and the legibility that justified the star in the first place is gone. Keep dimensions flat and wide. Tolerate the repetition; it's cheaper than the joins.

3. Dimensions that aren't conformed

Each team builds its own "customer" — sales from the CRM, product from the event stream, finance from billing — each subtly different. Now per-customer metrics don't tie across processes, and nobody can say which is right. A star is only as trustworthy as its dimensions are shared. Conform them: one customer dimension, one date dimension, reused everywhere. It's what makes metrics survive a reorg.

4. Natural keys in the fact table

Joining facts to dimensions on the source system's IDs instead of surrogate keys. It works on day one, then blocks Type 2 history, shatters when a source system is replaced, and slows every join. Use a warehouse-assigned surrogate key; keep the natural key as an attribute for traceability, but never join on it.

5. Fact-to-fact joins

The tempting shortcut of joining two fact tables directly — orders to shipments, say — on some shared key. It almost always produces a fan-out: rows multiply, measures double-count, and the totals inflate in ways that are maddening to trace. Facts don't join to facts. They join to shared (conformed) dimensions, and you compare them by aggregating each to a common grain first — "orders by day" next to "shipments by day," joined on the date dimension. Slightly more work; correct answers.

The pattern behind the patterns

Four of these five trace back to two root habits: respect the grain, and conform the dimensions. Hold those two and most of the ways a star goes wrong simply can't happen. The shape is easy. The discipline is the craft — and it's the part that determines whether anyone trusts the numbers a year from now.

Also asListensoonSlidessoonPodcastsoonVideosoon

Have data that should be doing more?

Tell me about the pipeline that breaks, the metric nobody trusts, or the analysis stuck in a notebook. Let's operationalize it.