21. Februar 2018

The weird Year-To-Date calculation

Some time ago I stumbled about the following question on powerbi.community.com

"Explanation of Context in Calculating Cumulative Values"

And the following DAX statement was given

Cumulative 2 = CALCULATE (

SUM ( 'Invoice lines'[Sales] ),

FILTER (

ALLSELECTED('BI-Dates'[Date]),

'BI-Dates'[Date] <= MAX ( 'BI-Dates'[Date] )

)

The questions that have been raised, have been more or less the same I was asking myself, as I started to tackle DAX

Why does the 2nd parameter of FILTER work,
and why it's not possible to omit the aggregation function
Why is the value not always the same if we remove an existing filter on the Date column by using ALLSELECTED,
assuming that MAX will always return the same value, the max value of ALLSELECTED()
If the parameter of MAX is the current context, then why do we have to use MAX at all

If you follow on reading, you will find my answer to this question …

First, some principles

A measure is always evaluated in a certain context, to visualize this context (talking about mental visualization) I always use the matrix visual, meaning a matrix that has row headers and column headers. Each cell in this matrix determines the context the measure gets evaluated in, it's the evaluation context of the measure. But knowing that this context is called Evaluation Context can also be a burden at least at the start, so I always think of a cell as a window that shows all the rows from the table filtered by the row and column header, but basically most of the time it's more than just 1 row.

The rows that are used (aggregated) by the measure are filtered down by slicers, row headers and column headers, slicers, row headers and column headers define the Filter Context that determines the rows (the Filter Context filters down the rows of the table) used by the expression, e.g. SUM('table'[numeric value]). This explains why have to use aggregation functions like SUM(…), rows have to be aggregated, to return a single value.

The DAX function CALCULATE(<expression> , <filter 1> , … , <filter n>) allows us to alter the Filter Context, meaning to broaden our view on the table, but also to narrow down this view. This means we can get a scenic view to all the rows of the table or just a tiny piece. CALCULATE allows us to change the size of the window we are using to look at our data table.

Some kind of rule that has to be considered, before the expression is evaluated, the filtering is applied. If you are reading a more complex DAX statement and you find nested CALCULATEs be aware, that the filter statement(s) from the outermost CALCULATE are applied, the new window is passed to the next CALCULATE and so forth and so forth. This "behavior" is totally different from all your formulas you have ever written in Excel, but be assured, you get used to it.

With this said, now it's time to think about the function FILTER(...)

The function FILTER(...) returns a table, this table expands or reduces the rows that are aggregated by the expression, this is a good, but also a little mind-boggling read about expanded tables:

https://www.sqlbi.com/articles/context-transition-and-expanded-tables/

The first parameter of FILTER(...) is by itself a table, for each row of this table the condition will be evaluated, if the evaluation results to TRUE, the row of the table remains otherwise the row is purged from the initial set of rows. This means that FILTER(...) returns a table of rows, that successfully passed the condition. This reminds us, that FILTER(...) is an iterator function, this means that it iterates through all the rows of the table or a DAX statement that returns a table (the 1st paramter of FILTER is a table),

ALLSELECTED(...) returns a table, in this case, a one-columned table. The magic of ALLSELECTED(...) is that it returns not just the Date value of the current row or column, but all the Date values that are used in the current query context. Of course this also works with other columns not just with Date columns (here you will find ALLSELECTED() and its intricate workings finally explained.

One of the most mind-boggling quests in DAX I had to tackle is the same question that was asked in the community post: what is the left part of the comparison and what is the right part, the MAX('...'[Date]).

Knowing that FILTER() is an iterator function, meaning it iterates through each row of that table, it's possible to refer to a column of that table, precisely the value of the specified column of the currently iterated row. For this reason it's valid to state that the part of the comparison w/o an aggregation function represents the current row value of the iterated table.

Now the MAX(...) part, the article I mentioned above explains the concept of expanded tables, a much deeper explanation can be found in the book "The Definitive Guide DAX" by @AlbertoFerrari and @marcorusso. I recommend to read this book very carefully, if you want to truly understand the mechanics of DAX.

This book also explains that CALCULATE does not replace columns contributing to the current Filter Context, instead are columns are added. This means suddenly there are two date columns. The right part of the comparison refers to column of the current context. MAX(...) represents the current value, of the current row. Because we have two columns we have to use an Aggregation Function to refer to its value.

Basically much is been said, to explain a little DAX statement, but it helps to explain some of the mechanics of DAX

Tags: DAX

Kommentar schreiben

Kommentare: 0