Data Ingestion

The necessity of data ingestion into the data pipeline is quite obvious, without data the analytical question can not be answered. But expecting that there is one solution, one specific toolset that can applied to each specific problem,

will lead to some frustration.

Data ingestion is defined (defined in the context of this website) as a process, that combines methods and tools to move data between systems (the data pipeline as a system of components). This definition seems to be little vague, but from my point of view, it's not possible to be more specific. Each data category "data at rest" and "data in motion" brings its own challenges to the process of data ingestion. Either the huge amount of data that has to be initially loaded into the pipeline, or the myriad of data points emitted by devices that have to be captured. And of course there are questions to be answered that need data from more than data source. To answer these questions some kind of orchestration is needed, that ensures that all the single pieces play in harmony, each single step is executed in the proper sequence, and that mishap is planned for (maybe the most important aspect, things go wrong, no matter how careful you design your pipeline). This data ingestion happens at scale, or bits and pieces, no matter what question has to be answered, the selection of the toolset has to be taken carefully.

Maybe a toolset that handles myriads of data points quite well, is not the proper tool, to answer a question with lesser importance (always consider of the expected value of information).

The toolset that will be described in this part of the site will encompass tools like Microsoft Flow, Microsoft SQL Server Integration Services, Azure Data Factory, Kafka, and most probably some other tools. 

Here you will find some deas, why one of my favorite tools for data ingestion is PowerApps