Analyzing R

Updated this page on 2018-03-16 (quite early in the morning)
This is my first (private) project that started a while ago, and will maybe be one of these projects that will not finish soon,

Basically the analytical pipeline works like this:

  1. harvesting data from different websites using the R packages rvest and data.table
  2. ingest, at least the first intake, is quite simple: creating csv files
  3. shape the data and upload the files to a local pbix file
  4. add a little dax magic and finally
  5. upload the pbix file to the Power BI service and share the report using the publish to web feature

What has to change in the not so distant future:

Currently the complete pipeline is based on hard manual work, starting R scripts on my laptop, waiting for the result of the harvest, do some postprocessing, do some upload into a local pbix file (currently 180Mio rows of package download information), publish the file into the PBI Service.

This takes its time, almost 4.5 hours.

So this has to be automated, I'm waiting for the GA of R in Azure SQL (hoping this feature will not eat so much from Azure credits).

I have almost finalized the list of my "analytical questions". So there will also be new analytical capabilities, e.g. incorporating the information which packages are imported by which package..

 

Stay tuned!