This site will be a container for all my musings about the analytical pipeline.
For this reason it will be necessary to define the Analytical Pipeline (at least define this pipeline from my point of view). From a general perspective (be aware that this is my perspective) five activities are necessary to build an analytical pipeline, these steps are
The overall goal of an analytical pipeline is to answer an analytical question. To achieve this overall goal, different data sources have to be targeted and their data has to be ingested into the pipeline and properly processed. During these first steps the data ingested into the pipeline often has to be stored in one or more different data stores, each is used or its special type of usage, finally the result of the data processing has to be delivered to its users, its audience. Depending on the nature of the question, different types of processing methods and also different data stores may be used along the flow of the data throughout the pipeline.
I put a direction to these activities, but I also added this direction to spur your critical mind, because
For this reason I believe that these activities are tightly related, and the above mentioned sequence of these activities will just aid as a guidance.
I will use blog posts to describe how different activities are combined to answer analytical questions. In most of my upcoming blog posts I will link to different topics from the activities used in the pipeline. Each activity has its own menu and is by itself representing an essential part in analytical pipeline.
Hopefully this site will help its readers as much as it helps me to focus on each activity always knowing that most of the time more than one activity has to be mastered to find an answer to an analytical question.
I’m not sure if this will become the start of a new series of articles or if this will only be a single lengthy article.
No matter, the motivation behind this article is that I’m often reading questions about data transformation tasks that have one thing in common, there is a lack of understanding of Power Query’s data structures and how these data structures can be navigated.
Sure, most of the time, a Power Query query represents and, importantly, returns a table that will be used inside Power BI’s dataset. Unfortunately, it’s also very common that data transformations are required before the table can be used with the dataset.
No matter what, after we are connected to one of the many data sources, be it a SharePoint list, an Excel file inside a SharePoint library, a SQL Server database, or some big data store, the first appearance of the connected artifact looks like a table, there are rows and columns.
Explaining the data structures of Power Query requires sample data. For the sake of simplicity, I enter sample data directly into Power BI. This makes it more simple to follow all the aspects I describe in this article. You will find the pbix file here.
The following screenshot shows the sample data I will use throughout this article.
The above table represents the status (column: Status) of an order (column: Order) for a given customer (column: Customer)
In the next two chapters, I share my imagination and my understanding of the thing we call a table in Power Query and share an example of how this understanding helps solving a specific task.