Before we delve into the realm of Power BI data modeling (or Analysis Services Tabular), it's necessary to understand what this data modeling is all about.
A thorough understanding of data modeling is vital because the Power BI data model determines the insights that can be derived and visualized from our data, leading us to better and more actionable decisions. Because we are striving for the ideal data model all the time, others call us obsessed with data modeling 😊
To get this understanding, we start with simple definitions of the terms: model, data, and of course, data model. We will publish articles covering different aspects of data modeling with Power BI / Analysis Services Tabular throughout the following months. There will not be any specific sequence or timeline; these articles will help to create better data models. A better data model will help get your answers faster!
Now some definitions!
what is a model?
According to Wikipedia, Model (Model - Wikipedia): "A model is an informative representation of an object, person, or system."
Re-phrasing this definition,
a model helps to understand the natural world and also helps to shape our future.
Of course, an insufficient model can not be a good representation of the real world.
What is data?
Data (Data - Wikipedia): "Data are individual facts, statistics, or items of information, ..., that are collected through observation".
what is a data model?
Combining the above definitions, we consider data modeling a method for creating a simplified representation (the model) of an object, a process, or any other real-life object using data. As there are different requirements, there are various methods. In the series of coming articles, we will focus on a data modeling method called dimensional modeling. The concept of dimensional modeling is used to answer analytical questions like: How has sales developed for brand x in all sales regions.
The result (the model will likely evolve) of the dimensional modeling process is called a star schema, but be aware that it's very probably the model will change and evolve because requirements are changing. The term star schema is used because simple (but powerful) dimensional data models resemble a star (some imagination needed). A dimensional data model is better suited than other data models for answering analytical questions. Still, it's not a good idea to use a dimensional data model as a foundation for your new LOB application or ERP system.
A dimensional model separates two types of tables, dimension tables, and fact tables. To make life and data modeling more fun. It's not that simple because there are factless fact tables or junk dimension tables, not to forget degenerated dimension tables J
What others say about data modeling:
A data model aligns the business needs with the data infrastructure.
Steve Hoberman - Data
power bi and the dimensional model
Talking about data modeling in combination with Power BI or Analysis Services Tabular, it's also important to understand that there are two layers involved: a logical and a physical layer. Most of the time, we, the data modelers, are just thinking of the logical layer. We can focus on the logical layer because the vertipaq engine (the physical layer) is highly efficient, and our questions are answered with blazing fast performance most of the time.
The physical layer stores the data in a specific structure and processes/queries whenever we interact with data inside the model using, e.g., a slicer to filter data or selecting a bar inside a bar chart. This performance is brought to us by an in-memory columnar database and the two engines that help process/query the data in this database. When the amount of data grows, it can become necessary to optimize the logical layer so that the engines will be able to process/query the data inside the physical layer in the most efficient way.
To be honest, sometimes, the distinction between these two types of tables is not that simple. The upcoming articles will provide a much deeper explanation of these table types – promised!
- Star Schema - The Complete Reference by Chris Adamson
- The Data Warehousing Toolkit by Ralph Kimball, Margy Ross
- Analyzing Data with Microsoft Power BI and Power Pivot for Excel by Alberto Ferrari and Marco Russo
- Modeling the Agile Data Warehouse with Data Vault by Hans Hultgren
- Data Modeler's Workbench by Steve Hoberman
- Data - Modeling for the business by Steve Hoberman, Donna Burbank, Chris Bradley