Creating an Azure Data Factory


Welcome back to the next step in our journey from Blob storage to PowerBI.

Now that we have created our blob storage account and our SQL databases, now it is time to create the pipelines that will move the data from location A to B.


The service that is responsible for this is called Azure Data Factory (or ADF for short). Now, before I continue, I have to mention that Azure markets this product as an ETL tool (Extract, Transform, Load) while I see it more of an orchestration tool. 


What this means in real life is that when thinking about "What should I use ADF for" it is mainly as a tool that will copy/move data and "orchestrate" different activities within ADF pipelines. This will become more clear as we go through our examples and this is not to say that ADF cannot do ETL functions.  However, when compared to other tools such as Informatica, SAP Data Services, or Databricks, ADF still has a ways to go to become a contender in that space.


To learn more about ADF please follow the helpful links provided by Microsoft: Azure Data Factory.


With that out of the way, the first step is to create an Azure Data Factory.


Note: I will be using ADF to reference Azure Data Factory.


Set up to create an Azure Data Factory

First log into the Azure portal and then type in "Data Factory" in the search box and then click on the "Data factories" icon:

Creating Azure Data Factory
First step to data domination!!


After clicking on the ADF icon you will then be taken to the home page that lists all of the different Data Factories that have been created so far under your subscription. If you are working at a company that actively uses ADFs you can filter down the list by clicking on "Subscription" tab highlighted in the red circle. By default ADF is set to show all.

List of Azure Data Factories
This screen lists all of the ADFs that have been created so far or help you find other ADFs.


To start the process click on the "Add" button. This will take you to the ADF creation screen.


Creating an Azure Data Factory

The process of creating an ADF is relatively simple. Once you click on the "Add" button you are taken to the creation page where the main selections that you need to make are to which subscription/resource group you want this ADF to be created in and then select from which region you want this ADF to be created in.

Creating Azure Data Factory
Probably the least amount of fields to fill out in Azure world


In the above screenshot we filled out the following:

  1. Subscription group - this value is usually auto filled so if you need to create it in a different location make sure to check that first.
  2. Resource group - Like in the above, this is to further filter down where ADF will live. You can either select an already created resource group or create a new one.
  3. Region - This one depends on the location you live in. As a rule try to create it as close as possible to where your data resides to reduce the time it takes the data to be sent from one region to another.
  4. Name - the name of the ADF
  5. Version - so by default V2 will be selected. This is the newest version of ADF that Azure provides. For context the V1 version is still offered as a legacy option, but the number of tools available in that ADF will be limited. If possible always use V2 (or later versions when they roll out).

Click the "Review + create" button to start the creation process. There is an option to add this pipeline to a Git repository, but that can be done post ADF creation and is outside of this blog post. I will definitely take the time to go over it at a later time.

Note: You might get an error when creating ADF that relates to the GIT configuration. If you do get that, the simple fix is to just select the "Configure Git Later" check box. To get to it click on "Git configuration tab".

ADF "Configure Git later" option
Always working on a prod version of ADF? That will never lead to trouble.


Once you go to the "Review + create" screen, click on "Create" button and you should be taken to the creation page. Give it a couple of minutes and you should get a success message.

ADF Resource creation resource
If you see this you are in good shape


Verifying that the Azure Data Factory was created

After the ADF is deployed, you have a couple of ways to get to it. First way is to go through the Azure portal. Follow the instructions that were listed at the start of the blog, but now you should see the newly created ADF.

Azure Data Factory


The second way to get to your ADF is to go through the url. Type in "adf.azure.com". This will take you directly to the ADF selection screen. From there you will be able to select the Subcription group and then the ADF that you just created. The only difference between this approach and doing it through the portal is that in the "adf.azure.com" screen you are able to see all the ADFs in a Subscription group, whereas in the portal you can refine it through the filtering or resource groups as well.

adf.azure.com
When someone pulls this up you know you are dealing with a pro.

If you are using the "adf.azure.com" approach click on "Continue" to launch the ADF screen. If you are in the portal then click on your ADF service your just spun up.

That will take you to the main page for this ADF. This page contains all the admin information about this ADF. For example, how many times this ADF has run, success vs errors, and also IAM options to limit access to only those that are authorized to use it.

For the purpose of this blog the only thing you need to do is to click on "Author & Monitor".

Author & Monitor ADF button
Got to be the worst name to say "open this ADF".


You will then be taken to the main ADF page. This page will include links to tutorial videos and is also the starting screen that shows you the basic activities you can do.