Azure Data Factory Welcome Screen Overview



Now that we have created our Azure Data Factory lets take a look inside and go over the different components that make up Azure Data Factory.

One thing to keep in mind is that when you create an Azure Data Factory you only create the client that will house the different pipelines that move data. The reason I want to make this distinction is that one Azure Data Factory can contain multiple number of pipelines or it can have only one as well.


Note: For the rest of the blog I will refer to Azure Data Factory as ADF.


In addition to the number of pipelines that an ADF can have it can also contain a large number of simple pipelines or one complex one and vice versa. Long story short if you go to the ADF screen in the Azure Portal you won't be able to tell the contents of that ADF unless you log in. That is why the naming convention that you use in your projects will have to be very important.

Even before we start talking about what type of pipelines we will be creating inside an ADF you have to ask yourself if you want this ADF to focus on a specific project or a functional area. For example you can have an ADF named "Project1-adf", however that will mean that for every single project you might have to recreate pipelines that are similar. The other problem is that when someone looks at those pipelines and if they only see the project name then it will be difficult to understand just what those pipelines are doing.

Conversely, let's say you name your ADF after a business function, for example Supply Chain. While the big umbrella approach would work in making sure that any pipelines that deal with Supply Chain go in here, that still leaves the problem of just who is allowed to go in there to develop, test, and finally schedule those pipelines.

Finally, you must also take stock that you would probably have three pipelines per project. Each one will have the same name, but you would be separating them based on environments. There will be one for dev, test, and prod. While in the beginning you do not have to worry about this as you are only testing this pipeline, after a little while you would need those 2 additional pipelines to create a CI/CD (continuous integration and continuous deployment) process.

I will be doing a future blog post that will show how to set up the CI/CD process using Azure DevOps, however this is outside the scope of this post. For the purposes of the Blob Storage to PowerBI series we will only look at a single pipeline.

Now that we got that over with let's start with the actual content of this blog post!!!


Overview of Azure Data Factory


In our previous blog post we have created a brand new ADF. In that post we also discussed how to log into that ADF. One way was through the Azure portal and the other way is to use adf.azure.com. 

For the purposes of this post I am going to use adf.azure.com. If you want an overview on how to log in through the Azure portal please read my previous post.

Once you are in the adf.azure.com portal, select the subscription group where your ADF is and then click on "Continue".

Selecting ADF from adf.azure.com

Once you are logged in you will be greeted with the ADF Welcome page. 


ADF Welcome Page


On the welcome page there are several sections so let's go through all of them.

The first section that you will see is what I call the quick links buttons.


Azure Data Factory Welcome page


The quick links above are front and center and are meant to help you start creating your pipeline.

Create pipeline quick button

The "Create Pipeline" button will take you directly to the workspace screen where you will be able to start creating pipelines. 

Azure Data Factory workspace screen
This is where the magic happens!


The first thing to notice is the top blue button called "Publish All". You can think of this as the "Save" button when you are developing a pipeline that is not attached to a Git repository. Keep in mind that since this is not attached to a Git repo, every time you click on Publish means that your old version of the pipeline is erased and is replaced with the new version that you just developed.


The next pane to notice is all the way on the right. This is the "Properties" pane and this will mostly be used to name your pipeline and provide a description as to what this pipeline does.  A pro tip is to make sure that you put some kind of description right away.  It is very easy to end up with 5 pipelines that you were testing and then not remember which one is the actual "Production Ready" pipeline.

The last thing I wanted to point out is the pane on the left side that contains all of your Pipelines, Datasets, and Data Flows. I will go over each one in detail in later posts but you can think of them this way:

  1. Pipelines - This will act as a drop down window that will list all the pipelines that you have created. In addition you can also create folders which will help further subdivide what these pipelines are doing.
  2. Datasets - You can think of this as a collection of data sources and data targets. In other words, if you need to get data from a blob storage account you would need to point to a "Dataset" in that blob storage account. Conversely, if you are looking to push data into a SQL database then you would create a Azure SQL DB "Dataset" to be used in the pipeline.
  3. Data flow - Data flow is a new feature that was added to ADF by Azure and the point of this is to create a graphical ETL workflow. These workflows work almost the same way as the regular workspace screen, but the main difference is that each activity (ex: join, copy, mapping) can be done as separate activities in Data flow.  In regular ADF, all of those could be handled in one activity (ex: COPY activity). A more thorough explanation will be coming in future posts.

Data flow quick button


An example of how a Data Flow looks like can be seen in the below screenshot:

Azure Data flow

Just like with the regular ADF if you want to save this Data flow you have to click on the "Publish all" button, unless you are connected to a Git repo and you can get to this screen by clicking on the quick link button "Create Data flow"

Create Pipeline From Templates Quick Button

Another option that you have when developing a pipeline is to use a template.   This allows you to skip some of the manual set up steps.  This is especially useful if you have a specific use case that you are trying to fulfill and you would only need to update a couple of "Datasets" properties.

When you first click on the templates button you are taken to the following screen:

Azure Data Factory templates

From the templates screen you will be treated to all of the different use cases that Microsoft has currently created. You can also search by using either the search box (ex: SAP BW) or by the different Categories.


Once you have found the template that you want to use simply click on it. This will take you to the next step of template selection where you must configure the different components of your template to fit your need.