Why You Need a Model to Make Sense of Your Data

Have a look at this Rubik’s Cube: six colors, traditional shape. But it doesn’t make sense. It is non-assemblable — it has three sides with orange center pieces. According to the Rubik’s Cube model, this is an impossible variation: while playing with an assembled Rubik’s Cube, you could never arrive at this state.

There are roughly 10³⁸ different ways to paint 54 cells with six colors, nine cells per color: 54! / (9!)⁶. However, only one in approximately 3.25 × 10¹⁵ of those configurations makes sense as a valid Rubik’s Cube state.

The same is true for your data. The fact that SQL is valid doesn’t mean the returned data makes sense. The model does not merely limit the ways datasets can “spin”; it ensures that every combination makes sense from at least some standpoint, in this way the model stands between plausible and true.

A grammatically perfect hallucinated sentence may not make sense as well, but at least that is usually easy for humans to catch. With spreadsheet data, the nonsense can look perfectly valid as well. 

What does it look like in practice? The data model says: you have to join customers and orders only using specific fields like customer.id = orders.customer_crm_id. And the fact that orders.customer_id field exists doesn’t mean it makes sense to use it for relationships. It brings another fact — there is no one right model for all cases. Some businesses link Campaign right to Customers, another put Session in a middle to factor in multi-touch attribution cases:

Campaign → Customers,  Campaign → Session → Customers. 


In the real world we use the same approach as well: for some cases we need classical physics, for others quantum physics works. It is up to business leaders to frame a model to rule the business. And it performs as good as coherent it is.

The good news is that the model always exists, even if the business doesn’t realize it. Having a weak model feels like regularly arguing in meetings about numbers that don’t match, while following the wrong decisions because of the continuously untrusted data. 

Your self-service analytics performs only as well as your data model reflects your business model.