I love building Power BI solutions, solutions that help my colleagues make better decisions. I’m very much interested in the overall data architecture and love creating data models that sometimes look like a star, but they often don’t because there are more than six tables 😱. To feed these models with data, sometimes we have to do heavy data massaging, and sometimes we don’t need to. The content creation starts when the model is done (at least for a few months). This last step is often called data visualization. When doing all the fun things I mentioned above, it’s easy to overlook a very important fact: when we hit “Get data,” a data source is created inside the Power BI file . When we do this, this works seamlessly 99.999% of the time, except when someone migrated to a new SAP BW version but forgot to tell the client team that a new connector must be rolled out to client machines or the data source is REST API endpoint with poor documentation not revealing how to authenticate. “Get data” works so seamlessly using Power BI Desktop that people wonder why they can not configure an automatic data refresh after publishing the Power BI file into their workspace and suddenly need a data gateway connection . I started working with my current employer in 2017 and since that time 10s of thousands of Power BI apps and many more workspaces  and a bunch of datasets came into existence. On average, there are 1.6 datasets per App, only counting apps used in production. Since that start, colleagues of mine have moved many data sources from our basement to another basement owned by someone else. Some of these databases have been migrated to Azure SQL DBs, some are still running on virtual machines.
The on-premises gateway is the untiring component that is uncomplainingly “managing” a great deal of all our data sources. To be precise, precision is essential when discussing data source and gateway management. Of course, the on-premises gateway is not managing the connections; it’s more like a humble and silent servant, never seen, never praised.
This is the start of a series of articles inspired by the feedback I got on one of my latest blog articles, “Have data will travel, or should I stay or should I go”
The series will encompass the following aspects:
- Gateways and Cloud connections - How do they work I use this article to explain how the gateways and cloud connections work using my own words coupled with some experience. Please be aware that my article will not explain the fantastic documentation. In this article, I will focus on the on-premises data gateway because the vnet gateway (unfortunately) and the new cloud connection are still in preview. Nevertheless, the latter two will not be forgotten.
- Power BI on-premises data gateway: what you need to know
- What you must tell your colleagues (excerpt from: Power BI 101 - the survival guide) Depending on the type of organization you are part of, colleagues are likely asking you to create a new
gateway connection, or they can create new connections alone, I consider the most important things they have to know are the two things below. I will elaborate more on these two topics in the
article and add other aspects
- all data sources forming a dataset must leverage the same gateway cluster
- if people are going to leave the team, make sure the team is aware of the data source names and does know the password of any data source involved
- Ideas on how to manage your on-premises data gateways from a technical point of view with a touch of an organizational perspective This article will be about building clusters, aligning your gateway clusters with different workloads, like import vs. directquery, or aligning gateway clusters with different environments, like dev, test, and prod
- Some gory technical and security-related details
- Differences between the different types of gateways/connections, advantages and disadvantages
- Avoid unnecessary data travel whenever possible (done; see the above-mentioned article) This article is about avoiding unnecessary data travel, assuming the data and the premium capacity are located in the same Azure region, which is not your Power BI home tenant.
 With the advent of the Power BI project file format (pbip) I will no longer use the term pbix as the term to describe the artifact I create using Power BI Desktop. instead, I will call this artifact “PBI file” no matter if it’s the pbix or the pbip file format.
 Do you wonder why people do not know they need a data gateway connection when they want to create an automatic refresh sourcing data from an on-premises data source? I assume that they did not read the “Power BI 101- the survival guide” document, which was sent to them automatically after they had installed Power BI Desktop 😉
 I do not count “My workspace”, I never do that.