Hello and welcome back. Today I wanted to go through the steps of creating a pipeline in Azure Data Factory. Now you might be saying "why is there a post about a pipeline since I have created this thing called Azure Data Factory?".
Well the reason is that just because you created an Azure Data Factory does not mean it will do anything for you....yet. You have to create a pipeline inside an "Azure Data Factory" and then you will be in business.
If you found this article and have no idea what an Azure Data Factory is then please follow my previous blog post that explains on how to set it up and what it is:
Now that is out of the way let's go through and actually create our first pipeline.
Prerequisites
Before being able to create this pipeline we have to make sure that you have the following things:
- An Azure account (either through your company or your own).
- Created an Azure Data Factory.
Keep in mind that when we say Azure Data Factory pipeline we don't just mean creating the Azure Data Factory in the Azure portal. When I say creating a pipeline that means inside the Azure Data Factory that was created.
That being said lets get started.
Logging in
All great journeys being with a first step, and our first step is to open up adf.azure.com and then find the Azure Data Factory that you have created. This will usually be under a specific subscription group.
Once you find your Azure Data Factory click on "Continue" to log into your Azure Data Factory.
 |
First step to greatness!!! |
Navigating the home page
Once you have logged into Azure Data Factory either click on the pencil icon on the left hand side or click on "Create pipeline" icon on the welcome page.
 |
If only everything was as simple as this. |
What we are doing in this step is called "Authoring" a pipeline. While there are a lot of different actions you can do at the home page I usually prefer to go straight to the Author section and start developing a pipeline.
If you are interested in seeing more about the different buttons and things things that you see on the welcome screen feel free to check out my follow post about all the different buttons that you can click on the first time you log in.
"Authoring" your first pipeline
Now that we are "developer" mode creation of a new pipeline is actually very easy.
If you notice currently we have a very big 0 next to the "Pipeline" section.
 |
Its okay, no one is judging. |
So to rectify click on that 0 then select "New Pipeline".
And that is it. Now you should see that a new pipeline is created called "pipeline 1" and on the right hand side you will see the "Activities" pane and the "Properties" tab open up on the right hand side.
 |
Taking that one small step!!!
|
As you can see a lot of stuff has appears once you have created that new pipeline.
Properties tab
This tab can be thought of as anything that you need to describer your pipeline. The name should be short enough, but descriptive enough to let you know what this pipeline does.
In the description section feel free to put as many details as you want, but make sure that again you don't write a novel.
For concurrency basically manages parallelism for this pipeline, but unless you have a specific reason to limit this I would just leave this one alone. In my opinion that should be done through limiting things through actual pipeline activities or proper chaining of activities.
Activities tab
This section can be though of as a list of all the things (or activities) that your pipeline needs to do. Each Activity is itself a contained unit. For example, click and hold the copy activity and move it to the whitespace screen.
Once placed you will notice that there are multiple tabs available to you, but the main two are "Source" and "Sink". Think of these as the where you are getting your data (Source) and where you are going to write/copy data to (Sink).
Now click on the little trashcan icon to delete these activity for now.
Summary and next step
Now that we have created a blank pipeline we won't be able to save it yet. If you click on the "Publish all" button it will show an error saying that we can't have a blank pipeline.
In my next blog post I will talk about DataSets and Linked Services. I will go over what they are, how to create them and then how to use those DataSets within a Copy activity.
Comments
Post a Comment