Creating a table in Azure Database using Query Editor: Blob Storage to Power BI part 4
In our previous post we have gone through the steps of creating an Azure SQL Server and attaching an Azure SQL database to it. If you have not done so please refer back to my previous post "Blob Storage to Power BI part 3: Creating an Azure Database".
So now that we have a database running in our Azure environment it is time to start using it. This post will focus on how to create Database tables through the Azure portal.
I will then create two supplemental posts on how to create tables in both Azure Data Studio and SQL Server Management Studio (SSMS). This is to show you how each one feels and let you choose which one you prefer.
So before we begin let me summarize the three options you have to create your tables:
So now that we have a database running in our Azure environment it is time to start using it. This post will focus on how to create Database tables through the Azure portal.
I will then create two supplemental posts on how to create tables in both Azure Data Studio and SQL Server Management Studio (SSMS). This is to show you how each one feels and let you choose which one you prefer.
Overview
So before we begin let me summarize the three options you have to create your tables:
- Query Editor (available in Azure portal)
- Azure Data Studio
- SQL Server Management Studio (SSMS)
All three will do the same thing, but each one is a little different. That is the beauty of it, you can use which ever one you would prefer.
Now that we spoke ad-nauseam about the different tools we will be using, let's take a step back and remember just what we are going to be creating.
In my first post I described what the automated process should look like. Let's pull it up now and review it quickly.
Don't you just love process design!!!! |
So according to the above diagram we are now going to be creating the following: Staging table, Archive table, and then the DB view. They are going to be doing the following functions:
- Staging table - this table is going to contain that last run data. It's always good to stage the data that comes from a raw file so that we can see how the data is coming in before being used in views or other ETL functions.
- Archive table - the archive table is going to have the same fields as the Staging table, plus a field called "last_update_date". This field will be a sysdate field and will be auto-populated with the date when the ETL job was run. This will help when we want to see how the data looked like 3 or 4 weeks ago.
- View - this view will be placed on top of the Staging table. The reason for this is to have most of the calculations done on the Database side vs having PowerBI do it.
- Truncate Stored Procedure - this will need to be created to make sure that the Staging table will be empty at the start of the job so that it will only contain the latest data.
To go a little bit deeper I have also created a diagram of how those tables would look like and what fields we are going to be creating.
We will also create a stored procedure that will delete the records from the Staging table. This can also be done either through the Azure portal or through Data Studio so I will show how it can be done in both.
Phew. So with all of that out of the way why don't we start actually creating some tables.
Creating Tables and Stored Procedures in Azure Portal
So first things first: log into the Azure portal and then either search or click on "SQL Databases". Since we only have one SQL server it will be the only one we will have in our list. If you are in a situation where you have multiple databases then searching for SQL Server first will help you filter out the ones you don't want to look at. Once you see your database click on it to bring up the Overview page on the right hand side.
I then want to bring your attention to the button called "Query Editor". This is a feature that was added to be able to directly query and also create/delete/update tables in an Azure SQL database.
We will use Query Editor to create tables directly in Azure portal |
Once you click "Query Editor" you will then be brought to a page where you have to use the userid and password that you used when creating the SQL server.
Hopefully you wrote that password down somewhere. |
Then once you click okay you should be in the Query editor. You will also see that you have access to the three folders called "Tables", "Views", and "Stored Procedures". To the right you will see the query editor screen where you will be typing in all of your SQL code. You can also click on "New Query" to open up a separate tab so that you won't have one giant page of SQL.
Retro. |
Now let's talk about some of the limitations of Query Editor. First is that while you can create a new Query tab, once you log out that code is gone. So to make sure that your SQL is saved, you either have to save it by copying it somewhere (like notepad) or you have to click on "Save Query". Second is the Query editor is fine for small SQL statements like "Select * From table", but once you start getting into complex SQL code with multiple lines you start to run into issues where the screen is too small. You will also run into issues when validating data because if you have a table with 20 fields then you will constantly have to scroll left to right.
However, even with those limitations it is fair to say that the Editor does do its job and can work in a pinch where you need to get some data quickly and are working from a laptop that does not have the tools to do it.
The other thing that I will mention is that you might receive an error that says that your IP address is not allowed to log into this database. This is a security feature because since all of the services that are spun up on Azure point to public endpoints they don't just want to grant you access. To remedy this, go back to the Overview page and then click on the server name. This is explained in the following Microsoft article: Azure SQL Database and Data Warehouse network access controls.
Pro-tip: if you are ever in doubt to which server your Database belongs to, click the Server Name link it will automatically take you to it. Also saves time searching for the SQL server as well. |
Once there scroll down on the left hand side to the "Security tab". Under there click on "Firewall and virtual network". This will then bring up the Firewall screen on the right hand side.
There are a couple of things to go over on this screen so let's dive in. First you have the option called "Allow Azure services and resources to access this server". Just like the name implies, if you set this to "Off" that means that all other Azure tools will need to be allowed to connect to this SQL server, and in turn to all of the Databases that are attached to it.
Second option is to add "Client IP address". This would mean that you would have to specify the IP address of the device that you are using to log into the Database that you created. All you would need to do is to name this rule (ex: "Database access" and will go into Rule Name) and then specify the IP address that you are using (ex: "11.111.111.111" and that will go in both Start IP and End IP). Luckily that IP address will show up in that error message that you got when first trying to log into the Database through "Query Editor".
I am in no way a network security expert, so what I usually do is set "Allow Azure Services" to "On" and then click Save. I did have instances where that did not always work so as a back up I added the client IP address (the IP in the error message) as a Rule and then clicked Save. Again please refer to the "Azure SQL Database and Data Warehouse network access controls" article for more details.
Now going back to the actual creation of the Database tables and views. We are going to run the following SQL code within the portal and click on "Run".
Note: With all of the SQL code below you just need to copy and paste them into Azure portal. I will change the syntax of the below code slightly when we will be using Azure Data Studio and Microsoft SQL Server Management Studio.
The above code will be used to create the Staging table. Once you click on run you should see that table under the Tables folder in the Query Editor.
Then run the following code to create the Archive table:
In the above code we are adding Last_update_date which will be the sysdate that is auto populated with the date when the job ran. The reason for this is that every time an Azure Data Factory job runs and data is inserted into the POC_Archive table then we need to make sure we know the date of when this record was inserted.
The GETDATE column has pros and cons. Since we have a small set of records that will be inserted then this will be very close to the actual time that the job kicked off. This column will also technically write out a time stamp, but it will all be set to 0. Thus you would have a value "2019-11-17 00:00:000". Thus, if you have a requirement that you capture when a job ran down the second then the above approach will not work. You would have to instead use a parameter in Azure Data Factory and pass that to the "Last_update_date" column.
For our purposes we will be fine with this approach. I will explain in more detail about how the data will look like and the cadence of when the files will be arriving in the blob container.
So now that we created the Staging and Archive table the next two items to create are the POC_View and POC_Truncate.
The purpose of the view will be to summarize this years sales vs last years sales of all the items per Customer. Customer in this sense would be a retail company such as Target, Kroger, Walmart, etc.
The purpose of the Truncate procedure would be to delete the data in the Staging table right before new data is loaded. The reason I did this is because otherwise I would have had to come up with SQL logic that would compare between the different customers, their sales amounts, UPCs, and dates and then either update or leave them based on if there were differences. Since the requirements for this POC were simpler (and that we would run this job once a week) I decided it would be easier to just delete the data from the Staging table every time the Azure Data Factory pipeline was run and then load the Staging table with fresh data. This would also mean that the view would automatically have current and up to date data every time it was queried.
The code that I used for the stored procedure is as follows:
Now that all of the tables, views, and stored procedures are created we can now see all of them in the Query Editor screen.
If you want to save one of those queries you can just click on "Save query". This will download the file as a ".sql" file which you can then open with an editor of your choice (I only used Notepad as an example).
Now going back to the actual creation of the Database tables and views. We are going to run the following SQL code within the portal and click on "Run".
Note: With all of the SQL code below you just need to copy and paste them into Azure portal. I will change the syntax of the below code slightly when we will be using Azure Data Studio and Microsoft SQL Server Management Studio.
CREATE TABLE POC_Stage (
Customer varchar(255),
UPC_EIN varchar(255),
Sale_amt_TY decimal(13,2),
Sale_amt_PY decimal (13,2),
"Date_of_Sales _week" DATE
)
The above code will be used to create the Staging table. Once you click on run you should see that table under the Tables folder in the Query Editor.
All tables, views, and stored procedures will be automatically stored under their respective folders. |
Then run the following code to create the Archive table:
CREATE TABLE POC_Archive (
Customer varchar(255),
UPC_EIN varchar(255),
Sale_amt_TY decimal(13,2),
Sale_amt_PY decimal (13,2),
"Date_of_Sales _week" DATE,
"Last_update_date" date DEFAULT GETDATE()
)
In the above code we are adding Last_update_date which will be the sysdate that is auto populated with the date when the job ran. The reason for this is that every time an Azure Data Factory job runs and data is inserted into the POC_Archive table then we need to make sure we know the date of when this record was inserted.
The GETDATE column has pros and cons. Since we have a small set of records that will be inserted then this will be very close to the actual time that the job kicked off. This column will also technically write out a time stamp, but it will all be set to 0. Thus you would have a value "2019-11-17 00:00:000". Thus, if you have a requirement that you capture when a job ran down the second then the above approach will not work. You would have to instead use a parameter in Azure Data Factory and pass that to the "Last_update_date" column.
For our purposes we will be fine with this approach. I will explain in more detail about how the data will look like and the cadence of when the files will be arriving in the blob container.
So now that we created the Staging and Archive table the next two items to create are the POC_View and POC_Truncate.
The purpose of the view will be to summarize this years sales vs last years sales of all the items per Customer. Customer in this sense would be a retail company such as Target, Kroger, Walmart, etc.
CREATE VIEW POC_VIEW AS
(
select Customer, Sum(Sale_amt_TY) AS Total_sales_Current_year,Sum(Sale_amt_PY) AS Total_sales_Last_year
from [dbo].[POC_Stage]
GROUP BY Customer
)
The purpose of the Truncate procedure would be to delete the data in the Staging table right before new data is loaded. The reason I did this is because otherwise I would have had to come up with SQL logic that would compare between the different customers, their sales amounts, UPCs, and dates and then either update or leave them based on if there were differences. Since the requirements for this POC were simpler (and that we would run this job once a week) I decided it would be easier to just delete the data from the Staging table every time the Azure Data Factory pipeline was run and then load the Staging table with fresh data. This would also mean that the view would automatically have current and up to date data every time it was queried.
The code that I used for the stored procedure is as follows:
CREATE PROCEDURE POC_TRUNCATE AS
TRUNCATE TABLE [dbo].[POC_Stage]
Now that all of the tables, views, and stored procedures are created we can now see all of them in the Query Editor screen.
I added a new Query for each CREATE SQL code to keep everything separate |
If you want to save one of those queries you can just click on "Save query". This will download the file as a ".sql" file which you can then open with an editor of your choice (I only used Notepad as an example).
Summary
Now that we have created all of the tables, views, and stored procedures that we need in the next section blog post I will describe how to do the same process using Azure Data Studio and Microsoft SQL Server Management Studio. Both tools will give you the same end result, but they might be easier to use than the Query Editor that is in the Azure Portal.
After that I will post about setting up an Azure Data Factory pipeline and then we will finally start putting all of the pieces together with a PowerBI dashboard.
Comments
Post a Comment