I'm considering a relationship being the sword of the data model. A sword because relationships are the key to slice and dice through vast amounts of data.
For this reason I'm honored, I can use the picture below by Matthew Roche (I assume you are aware of his blog, if not, something is wrong)
To fully understand the importance and possibilities of a data model with proper relationships, the concept of extended tables (aka expanded view) has to be considered whenever we are tasked with optimizing the performance of a data model or a single measure.
In this epic article MDX and DAX topics: The Logic behind the Magic of DAX Cross Table Filtering (mdxdax.blogspot.com), Jeffrey Wang, one of the inventors of DAX, explains the concept of extended tables as
"… is to think of each base table as extended by its related tables."
A base table is a table on the many-side of a relationship. The base table is extended by the related table on the one side of a relationship. After a base table is extended by its related tables, the new object is called the "expanded view of a base table."
Using the pbix file that is based on the ContosoRetailDW sample data, the following screenshot shows the expanded view of the base table FactOnlineSales:
It's essential to notice that the expanded view of the base table FactOnlineSales contains all the columns of the tables directly related to the base table and the columns from the indirectly related tables. The columns are included because they extend the table DimProduct. In the example above columns from the tables DimProductSubCategory and DimCategory are part of the extended table as well.
The concept of the Extended Table is the reason why this measure works
Measure sample expanded view =
, RELATED( 'DimProductCategory'[ProductCategoryName] ) = "Audio"
You may be tempted to say the underlying concept is Row Context as the function FILTER creates a row context necessary for using RELATED to "pull" data from one of the related tables. This is because you are more likely familiar with terms like Formula Engine and Storage Engine or Filter Context and Row Context and that you are even aware of the phrase filter propagation. Still, the term extended table maybe sound new.
I tend to overlook concepts that are at the very foundation of something. I consider this typical. First, I think it is normal. Then I start to forget about it until I face a situation where there is no normal.
To provide a more vivid example, let's assume air is a concept, then it will be likely that we are not thinking about air all the time and how vital air is for breathing. Maybe we even tend to become a little reckless about air. This behavior comes to a grinding halt when we decide to go diving, and we suddenly realize that breathing without special precautions will become problematic.
Back to the data modeling thing.
Before I provide more details about extended tables and which types of relationships are considered in forming the expanded view, here is an example of how not considering the concept might lead to a data model that comes with its own challenges.
Assuming you have a fact table containing ~200 million rows and a dimension table with ~100k rows and you have to multiply a value from the fact table with a value from the dimension table like so
Suppose we face a task like the one above. In that case, it seems self-evident to precompute the column "Amount_with_rule" as we know there will be less formula engine activity involved in creating this information. This reasoning is well-founded as a scan combined with a simple aggregation like SUM performed by the blazing-fast storage engine (the sporty engine) will be much faster than the "complex" multiplication executed by the formula engine (the brainy engine).
But whenever we are going to precompute a value either using Power Query or even more upstream meaning inside the source system, we have to be aware of the number of distinct values this precomputing may create. This number determines the cardinality of the column. In regards to the example mentioned above, the following picture will show what this can mean:
The pre-computation of the column "fact1_preCompute" created ~181 million distinct values. This column is consuming ~11.8GB of RAM, as columns are allocating RAM no matter if they are used in any visual or not.
Even if you are using a Premium capacity that will host this kind of dataset, the chances are high that there will be more than one dataset competing for the available resources.
This article will describe how you can reduce the cardinality of columns: Cardinality - Your Majesty!
Nevertheless, there is a simple rule for the example above: do not precompute without any need.
One of the reasons for the vertipaq engine's incredible performance is the concept of the expanded view, the view on the data after a table has been extended based on its relationships. For this reason, it's not necessary to precompute values as measures will be fast, as long as these measures can rely on relationships.
A table will be extended with all the columns from related tables (the tables on the one-side of the relationship). It's also important to be aware that there is not just one extended table; basically, all tables will be extended either if they are on the many-side of a many-to-one relationship or they are on the one-side of a one-to-one relationship. Both tables that are part of a one-to-one relationship will be extended.
Since the introduction of composite models where we can connect to an already Power BI dataset using direct query connection mode, we will find the terms regular relationship and limited relationship in the official documentation. Before that wording started to mess with our minds, Marco Russo and Alberto Ferrari coined the terms strong relationship and weak relationship where strong relationships refer to the relationships that are now called regular relationships, and weak relationships refer to limited relationships. I have to admit that I like the differentiation provided by strong and weak more than regular and limited.
The following picture shows the types of relationships that are considered regular (strong) and limited (weak) and the extended tables that will be created:
From a personal point of view, I'm not 100% satisfied with the terms strong/weak or regular/limited because a many-to-many relationship can answer specific analytical questions that a regular relationship can not. For this reason one might consider a many-to-many relationship strong or that a many-to-many relationship posesses a super power.
Because I consider a regular relationship the sword that allows slicing through vast amounts of data, I like the word strong more than regular.
The world of data modeling is not simple, it's complex
When creating a data model, we import the data most of the time. During the data modeling process, we are following dimensional modeling methods creating a model that can be considered a data model adhering to the rules of a star schema.
Fortunately, the possibilities of a Power BI data model are constantly evolving. For quite some time now, it's possible to create a data model formed by tables residing in different datasets. This fantastic feature is called a composite model.
For this reason, all the things said above are valid for a data model that
- Contains a single dataset
- All tables are imported
Considering table extension, this means
- a table will not be extended by columns from tables that reside in a different dataset
- whenever a table is not imported, this table is not part of table extension, meaning
- Tables coming from a direct query connection (relational tables as well as tables residing in a tabular model)
- Hybrid tables will not be extended or be part of table extensions
There is an excellent article on how to model relationships in Power BI Desktop
Some final words
If possible, import a table. Imported tables will be considered by the vertipaq engine for a table extension, creating the expanded view whenever a regular relationship connects these tables.