Mincing Data - Gain insight from data

This site will be a container for all my musings about the analytical pipeline.

For this reason it will be necessary to define the Analytical Pipeline (at least define this pipeline from my point of view). From a general perspective (be aware that this is my perspective) five activities are necessary to build an analytical pipeline, these steps are

source

ingest

process

store

deliver


 The overall goal of an analytical pipeline is to answer an analytical question. To achieve this overall goal, different data sources have to be targeted and their data has to be ingested into the pipeline and properly processed. During these first steps the data ingested into the pipeline often has to be stored in one or more different data stores, each is used or its special type of usage, finally the result of the data processing has to be delivered to its users, its audience. Depending on the nature of the question, different types of processing methods and also different data stores may be used along the flow of the data throughout the pipeline.

 

I put a direction to these activities, but I also added this direction to spur your critical mind, because

  • source data is of course also stored somewhere
  • to target the source can become a challenge in the pipeline
  • to deliver processed data to the its audience can become a another challenge  if you try to deliver to mobile workforce
Successfully ingested data often gets processed more than once to answer different questions, and during its existence (inside an analytical data platform) this data will be transformed into various shapes, wrangled by different algorithms, and ingested into different "data stores".

For this reason I believe that these activities are tightly related, and the above mentioned sequence of these activities will just aid as a guidance.

 

I will use blog posts to describe how different activities are combined to answer analytical questions. In most of my upcoming blog posts I will link to different topics from the activities used in the pipeline. Each activity has its own menu and is by itself representing an essential part in analytical pipeline.

 

Hopefully this site will help its readers as much as it helps me to focus on each activity always knowing that most of the time more than one activity has to be mastered to find an answer to an analytical question.


The separation of data and content - why this is the way to go (part 1)

This article is inspired by a question I encountered a couple of days ago on the Fabric community:

 

How can users create their own reports but with RLS applied?

 

Please be aware that all the aspects I cover in this article are not affected by the file format, meaning pbix vs. pbip, or if more workspaces are involved because Power BI deployment pipelines are in place. Next, as more and more Fabric workspaces come to life, for this article, it’s not important what licensing model is backing the workspace.

This article is solely about one question: what has to be done if a content creator needs to create and publish reports but the content creator is not allowed to see all the data?

This seems to be a simple requirement: develop content (finally publish the report), but with Row Level Security (RLS) applied.

To answer the question, I think it’s necessary to understand the following core principle, at least to some extent:

  • Workspace roles

There are more reasons to separate the data from the content that are not due to RLS; I will cover these in a 2nd article in the next few days.

Workspace roles

The different workspace roles come with different permissions, where the Admin role grants the most permissions and the Viewer role the least permissions to the user. It’s very important always to remember that the three roles, Admin, Member, and Contributor, can publish a Power BI report to the workspace; maybe even more important is the fact that users with one of these roles assigned can edit all items inside the workspace. These items, of course, include semantic models. These users must be considered Admins of the semantic models; for this reason, RLS will not be applied when users are accessing the semantic model. The next image shows workspace roles and a semantic model that is shared with a user from a different workspace:

mehr lesen 0 Kommentare