Creating an Azure Data Factory
Welcome back to the next step in our journey from Blob storage to PowerBI.
Now that we have created our blob storage account and our SQL databases, now it is time to create the pipelines that will move the data from location A to B.
The service that is responsible for this is called Azure Data Factory (or ADF for short). Now, before I continue, I have to mention that Azure markets this product as an ETL tool (Extract, Transform, Load) while I see it more of an orchestration tool.
What this means in real life is that when thinking about "What should I use ADF for" it is mainly as a tool that will copy/move data and "orchestrate" different activities within ADF pipelines. This will become more clear as we go through our examples and this is not to say that ADF cannot do ETL functions. However, when compared to other tools such as Informatica, SAP Data Services, or Databricks, ADF still has a ways to go to become a contender in that space.
To learn more about ADF please follow the helpful links provided by Microsoft: Azure Data Factory.
With that out of the way, the first step is to create an Azure Data Factory.
Note: I will be using ADF to reference Azure Data Factory.
Set up to create an Azure Data Factory
First log into the Azure portal and then type in "Data Factory" in the search box and then click on the "Data factories" icon:
First step to data domination!! |
After clicking on the ADF icon you will then be taken to the home page that lists all of the different Data Factories that have been created so far under your subscription. If you are working at a company that actively uses ADFs you can filter down the list by clicking on "Subscription" tab highlighted in the red circle. By default ADF is set to show all.
This screen lists all of the ADFs that have been created so far or help you find other ADFs. |
To start the process click on the "Add" button. This will take you to the ADF creation screen.
Creating an Azure Data Factory
The process of creating an ADF is relatively simple. Once you click on the "Add" button you are taken to the creation page where the main selections that you need to make are to which subscription/resource group you want this ADF to be created in and then select from which region you want this ADF to be created in.
Probably the least amount of fields to fill out in Azure world |
In the above screenshot we filled out the following:
- Subscription group - this value is usually auto filled so if you need to create it in a different location make sure to check that first.
- Resource group - Like in the above, this is to further filter down where ADF will live. You can either select an already created resource group or create a new one.
- Region - This one depends on the location you live in. As a rule try to create it as close as possible to where your data resides to reduce the time it takes the data to be sent from one region to another.
- Name - the name of the ADF
- Version - so by default V2 will be selected. This is the newest version of ADF that Azure provides. For context the V1 version is still offered as a legacy option, but the number of tools available in that ADF will be limited. If possible always use V2 (or later versions when they roll out).
Always working on a prod version of ADF? That will never lead to trouble. |
If you see this you are in good shape |
When someone pulls this up you know you are dealing with a pro. |
Got to be the worst name to say "open this ADF". You will then be taken to the main ADF page. This page will include links to tutorial videos and is also the starting screen that shows you the basic activities you can do. |