This site will be a container for all my musings about the analytical pipeline.

For this reason it will be necessary to define the Analytical Pipeline (at least define this pipeline from my point of view). From a general perspective (be aware that this is my perspective) five activities are necessary to build an analytical pipeline, these steps are






 The overall goal of an analytical pipeline is to answer an analytical question. To achieve this overall goal, different data sources have to be targeted and their data has to be ingested into the pipeline and properly processed. During these first steps the data ingested into the pipeline often has to be stored in one or more different data stores, each is used or its special type of usage, finally the result of the data processing has to be delivered to its users, its audience. Depending on the nature of the question, different types of processing methods and also different data stores may be used along the flow of the data throughout the pipeline.


I put a direction to these activities, but I also added this direction to spur your critical mind, because

  • source data is of course also stored somewhere
  • to target the source can become a challenge in the pipeline
  • to deliver processed data to the its audience can become a another challenge  if you try to deliver to mobile workforce
Successfully ingested data often gets processed more than once to answer different questions, and during its existence (inside an analytical data platform) this data will be transformed into various shapes, wrangled by different algorithms, and ingested into different "data stores".

For this reason I believe that these activities are tightly related, and the above mentioned sequence of these activities will just aid as a guidance.


I will use blog posts to describe how different activities are combined to answer analytical questions. In most of my upcoming blog posts I will link to different topics from the activities used in the pipeline. Each activity has its own menu and is by itself representing an essential part in analytical pipeline.


Hopefully this site will help its readers as much as it helps me to focus on each activity always knowing that most of the time more than one activity has to be mastered to find an answer to an analytical question.

Retrieve values from remote tables inside a query step (anonymous function / inline function)

This is the 2nd article in a little series about advanced Power Query functions and the writing of custom functions itself, see here. This article is inspired by the following question on community.powerbi.com. I will not explain what the question on community.powerbi.com is about. If you are interested in the use case follow the previous link. Here I will just cover the syntax that allows to create an anonymous function to access an external table ( remote table for each row in the current query step.


The comments are more relevant to my future self then the code itself.


Table.AddColumn(#"Changed Type", "TheMostCurrentStartInterval"

            //row is just a name, the current row will be passed to an anonymous function

            , ( row ) =>


                // storing the value of a column from the outer row inside a variable, 

                // basically this is not necessary as a column value from the outer row can be accessed directly (see a little below)

                //theStartTime = row[StartTime],

                mostCurrentIntervalStart = 




                            // each is important as it allow to reference the inner row (the remote table), in this example

                            // the filter will applied to each row to filter the remote table 

                            // referencing the variable from above

                            //, each  [IntervalStart] <= theStartTime

                            // referencing the the value of the "outer" row directly  

                            , each  [IntervalStart] <= row[StartTime] 


                        , {{"IntervalStart", Order.Descending}}

                    // the below syntax is referencing the column IntervalStart [] from the first row {},

                    // zero-based     


            // the value of the step mostCurrentIntervalStart will be returned from the anonymous function

            in mostCurrentIntervalStart


