Use Case: Blob Storage to PowerBI with the help of Azure Data Factory and Azure DB

One of the things that I came across while working in Azure cloud space is that projects do not need to be overly complex to be useful to a business organization. In fact, by completing small projects a Data Engineer can build confidence with different business units, which in turn will lead to more projects in the future as they get more and more comfortable with using different Azure services.

This is why I wanted to document a small Use Case that I had worked on. While this project is simple (and involves minimal transformations) it does illustrate that even simple projects will contain a number of tools that need to work in sync.

This Use Case came from a user that wanted to see if they could create a pipeline that would take different csv files, and move them into a single database table, which will then be fed into a PowerBI report. At the time each department had a csv file which they managed and edited, then emailed. This email was then opened by the PowerBI developer and the files downloaded to their local machine to then be fed into a PowerBI report. Below is a high level sketch of how this processes worked.

As you can see that process commits the following sins:

  1. Process has a lot of manual steps and is not automated
  2. The files are not stored in a central location
  3. Trying to revert back to a previous state would be difficult as there are no back ups being made

I then looked at the different Azure services and tried to fit them into the ETL mold that I had worked with before. The reason I did this is to make sure that I don't get lost in all of the Azure tools that are available. 

To do this I tried to answer the following questions:

  1. Where will we make the source?
  2. Will there be any data transformations/data movement?
  3. Where will this data be loaded to?

From those questions I came up with the following process:

From left to right these are the steps that are being done:

  1. Business Users would place the file into a Blob storage account on every Monday and overwrite the old files
  2. Azure Data Factory picks up the file and inserts it into two Azure databases: Staging and Archive table
  3. The Staging database table will then become the source and the Archive table will have a "last_uploaded_date" column just in case further analysis needed to be done on previous data.
  4. A View is then placed on top of the Staging table doing a simple sum statement to reduce workload on PowerBI side
  5. Finally PowerBI will connect to the view and is refreshed once a week

The above process would then be scheduled to run once a week. While it looks like there are a lot of pieces, once everything is established and connected we then have an automated process that kicks off automatically once the files are uploaded on Monday.  In addition, this process will reduce the need of handling emails and keeping track of those source files.

The other reason this Use Case caught my eye is that overall it is not a very complex design and shows that cloud services can help even in simple use cases to reduce manual process and help users bring some consistency to their process. The success of this also led others to try to play around with Azure tools and see what they could come up with.

In the following posts I will go through step by step the different services that I created and how to set them up so that each one connects with each other.