· 

My favorite windowing function - WINDOW

The DAX windowing functions have helped me a lot in tackling complex analytical challenges. Nevertheless, I do not use them daily. This means sometimes I need time to regain the “windowing feel” again. This article will shorten the time until I’m thinking “windowing” again. Maybe it might help you as well.

 

I assume that you are aware of the mini-series of articles by Jeffrey Wang:

 

I consider RANK and ROWNUMBER somewhat special, so I will spare these two functions for a later article. Then, in my imagination, I consider OFFSET and INDEX as a shorter form of WINDOW. Because I like to minimize the number of tools I use, I will focus on WINDOW, but will mention OFFSET and INDEX when a somewhat shorter DAX statement can be used. What I love about these functions is the parameter called relation, which also gets me confused now and then. Even more if this parameter can be omitted. The official documentation (https://learn.microsoft.com/en-us/dax/window-function-dax) says it’s

 

A table expression from which the output rows are returned.

 

DAX guide (https://dax.guide/window/) states:

 

A table expression from which the output row is returned.

 

 

 

Here you can download the pbix file that contains the DAX measures: https://tommartens-my.sharepoint.com/:u:/g/personal/tom_minceddata_com/EaYvRtR7pL5JkB44j-VH1e4BgaO-cB8EHOgacJfr6tnkVg?e=2RR6sL

The relation parameter

f you are wondering why I love the parameter relation the most when it’s an optional parameter is simple: when this parameter is omitted, this parameter is implicitly replaced by the columns specified in the ORDERBY and PARTITIONBY sub-functions.

 

I think this makes understanding the DAX statement more complex by only saving some characters. “Explicit is better than implicit.” can be applied here. This is (should be) one of the guiding principles when creating Python code. You will find all these principles here: https://peps.python.org/pep-0020/. I imagine the relation as an ordered list of tuples, where a tuple can comprise one or more columns. The ordering of the tuples is specified by the ORDERBY sub-function. The next image shows a table from my very simple sample with some measures using the WINDOW function:

 

 

Row selection and the current row (the current tuple)

Sometimes, I need some extra thinking about how the “row selector” works because I’m confused about the current row. In the first part of a mini-series about windowing functions, Jeffrey Wang makes it clear:

 

… the window function will derive the current partition and the current row from its evaluation context.

 

Even if this is simple and clear, now and then, I need some extra thinking. For this reason, I always use the following visualization that reminds me of how to properly use the row-selection parameter(s):

 

 

Most of the time, I use WINDOW over the more specialized functions OFFSET and INDEX. I always imagine the row selection parameters of the WINDOW function as defining a lower bound and an upper bound. The result of WINDOW returns the rows of the relation between these bounds, including the lower and the upper bound. The next table shows some examples, all the examples based on the 2024-01-07 as the current row (the current tuple):

Some advanced examples

The next sections showcase some advanced examples. The many use cases where windowing functions improve the performance of the DAX measure and make the code more readable are sometimes not obvious. The next chapters are here to help me remember how I solved some challenges in the past.

Compare the current value with the average of the top 2 values

I consider this a common requirement. Before introducing the WINDOW functions, I used a combination of TOPN and RANKX, but now this has become way simpler and faster.

B - the first two =
IF( NOT( ISBLANK( [Amount (ms)] ) )
    ,var theAverageValue =
        AVERAGEX(
            WINDOW(
                    1 , abs
                    , 2 , abs
                    , ADDCOLUMNS(
                        SUMMARIZE(
                            ALLSELECTED( 'Sales' )
                            , 'Calendar'[Date]
                        )
                        ,"v" , [Amount (ms)]
                    )
                    -- beware, the relation is ordered by the measure using the descendence sort operator
                    , ORDERBY( [v] , DESC )
                    , DEFAULT
                )
            , [Amount (ms)]
        )
    return
    IF( [Amount (ms)] > theAverageValue
        , "more"
        , "less"
    )
    ,BLANK()
)

Compare the current value with the previous value

Finding the previous value has always been a nuisance, because our beloved vertipaq engine does not know the concept of a sequence. Instead of using statements like the one below:

MAXX( FILTER( … , [a column] < theCurrentValue, [a column] )

This can be easily done using WINDOW or OFFSET:

B - the previous value =
IF( NOT( ISBLANK( [Amount (ms)] ) )
    ,var thePreviousValue =
        CALCULATE(
            [Amount (ms)]
            , WINDOW(
                -1 , rel
                , -1 , rel
                , SUMMARIZE(
                    ALLSELECTED( 'Sales' )
                    , 'Calendar'[Date]
                )
                , ORDERBY( 'Calendar'[Date] , asc )
                , DEFAULT
            )
        )
    return
    [Amount (ms)] - thePreviousValue
)

And now OFFSET

B - the previous value (OFFSET) =
IF( NOT( ISBLANK( [Amount (ms)] ) )
    ,var thePreviousValue =
        CALCULATE(
            [Amount (ms)]
            , OFFSET(
                -1
                , SUMMARIZE(
                    ALLSELECTED( 'Sales' )
                    , 'Calendar'[Date]
                )
                , ORDERBY( 'Calendar'[Date] , asc )
                , DEFAULT
            )
        )
    return
    [Amount (ms)] - thePreviousValue
)

Depending on the size of the relation you might realize that OFFSET is faster than WINDOW. Nevertheless, I tend to use WINDOW instead of OFFSET when OFFSET is only a couple of milli seconds faster than WINDOW. I consider a DAX function being a tool, the more tools i use, the more complex a solution becomes. Because I always strive to create solutions with minimized complexity I tend to use a reduced set of DAX functions. This is also the reason why DATESBETWEEN is one of of my favorite functions and not one of the many time intelligence functions.

New and churned customer

I use WINDOW also to identify churned customer, new customer, and returning customer. Of course, this kind of analytical challenge is not restricted to customer, basically this can be used to answer questions regarding the appearance (new and returning) or the missing of items by comparing two lists. I also experience that the definition of the terms new, churning, and returning is not that simple as it may seem. In next two examples are comparing the list of customers of the current period with the list of customers of the previous period.

New Customer

B - new customer =
var currentDate = SELECTEDVALUE( 'Calendar'[Date] )
var currentCustomer =
    CALCULATETABLE(
        SUMMARIZE(
            ALLSELECTED( 'Sales' )
            , 'Customer'[Customer]
        )
        , 'Calendar'[Date] = currentDate
    )
var previousCustomer =
    CALCULATETABLE(
        SUMMARIZE(
            ALLSELECTED( 'Sales')
            , 'Customer'[Customer]
        )
        , WINDOW(
            -1, rel
            , -1 , rel
            , SUMMARIZE(
                ALLSELECTED( 'Sales' )
                , 'Calendar'[Date]
            )
            , ORDERBY( 'Calendar'[Date] , ASC )
            , DEFAULT
        )
    )
return
IF( HASONEVALUE( 'Calendar'[Date] ) && NOT( ISEMPTY( previousCustomer ) )
    , CONCATENATEX(
        except( currentCustomer, previousCustomer )
        , 'Customer'[Customer]
        , unichar(10)
    )
    ,BLANK()
)

Churning Customer

B - churned customer =
IF( HASONEVALUE( 'Calendar'[Date] )
    ,var currenDate = SELECTEDVALUE( 'Calendar'[Date])
    var currentCustomer =
        CALCULATETABLE(
            SUMMARIZE(
                ALLSELECTED( 'Sales' )
                , 'Customer'[Customer]
            )
            , 'Calendar'[Date]= currenDate
        )
    var previousCustomer =
        CALCULATETABLE(
            SUMMARIZE(
                ALLSELECTED( 'Sales' )
                , 'Customer'[Customer]
            )
            , WINDOW(
                -1, rel
                , -1 , rel
                , SUMMARIZE(
                    ALLSELECTED( 'Sales')
                    , 'Calendar'[Date]
                )
                , ORDERBY( 'Calendar'[Date] , ASC )
                , DEFAULT
            )
        )
    var churnedCustomer =
        except( previousCustomer , currentCustomer )
    return
    IF( COUNTROWS( churnedCustomer ) >= 1
        , CONCATENATEX(
            churnedCustomer
            , 'Customer'[Customer]
            , ", "
            , 'Customer'[Customer]
            , ASC
        )
        , BLANK()
    )
)


Thank you for reading

Kommentar schreiben

Kommentare: 0