How to: Snowflake Continuous Data Integration With Azure and Snowpipe in 2023

How to: Snowflake Continuous Data Integration With Azure and Snowpipe in 2023

How to auto-ingest data into Snowflake with Azure Blob Containers and Snowflake’s Snowpipe

Data integration is one of the first things that companies have to deal with when setting up their Snowflake cloud data warehouse. If you do not employ an ELT tool like Fivetran, Rivery or Stitch, you end up having to find a different solution. The best and most common solution in that case, is to utilize Snowflake’s Snowpipe in conjunction with one of three possible external storage locations (Azure Blob Storage, Google Cloud Storage or AWS S3).

This blog post covers the setup of a continuous data ingestion pipeline with Azure Blob Containers and Snowflake’s Snowpipe feature. Snowflake provides basic documentation on this topic. However, the instructions on Azure side utilize the CLI (Command Client Interface) of Azure and only list the commands to be executed in order to create the necessary resources in Azure. I have seen many companies who’d rather use the browser portal to create these resources. And therefore, this post explains the steps to take on the browser portal side as well as all steps needed on Snowflake side.

General Steps

  1. Create an Azure Resource Group 
  2. Create an Azure Storage Account
  3. Create Azure Container inside Storage Account
  4. Create Azure Queue inside Storage Account
  5. Create Azure Event Grid Subscription + link to Queue
  6. Create Snowflake Storage Integration
  7. Approve Snowflake Storage Integration app access to Azure
  8. Assign necessary roles to Snowflake app in Azure
  9. Create Snowflake Notification Integration
  10. Approve Snowflake Notification Integration app access to Azure
  11. Assign necessary roles to Snowflake app in Azure
  12. Create Snowflake External Stage
  13. Upload simple csv file to Azure storage container
  14. Run ls @<stage_name> in Snowflake to see files in container
  15. Create landing table with matching schema in Snowflake
  16. Create Snowpipe
  17. Refresh Snowpipe to run the data ingestion
  18. Check if data was loaded into target table in Snowflake
  19. Upload new csv file to Azure storage container
  20. Check if data was auto-ingested into target table in Snowflake

General Assumptions

This blog post assumes that you have already created a Resource Group as well as a Storage Account (general-purpose v2 account) in your Azure account.

Furthermore, you need to have the account admin role in Azure to complete some of the steps necessary to create the integration between Azure and Snowflake.

Let’s dive into it.

Step-by-Step Guide

Create Basic Azure Resources

  1. Go to the Resource Group that you created for your Snowflake data integration project
  2. Go to the Storage Account that you created for your Snowflake data integration project (the storage account should have been created as ‘Standard: general-purpose v2 account)
  3. Create a new Container inside your Storage Account

    create_new_blob_container_in_azure_portal
    Creating a New Blob Container in Azure Portal
    • Look for “Containers” in your Storage Account and click on it
    • Click on + Container in the menu bar to the top
    • Provide a useful name (eg. “claudy-snowflake-ingestion-container-v1”)
    • Anonymous access level set to “Private”
    • Create
  4. Create a new Queue (having one queue per Storage Account is the best practice for Snowflake)

    create_new_storage_queue_in_azure_portal
    Creating a New Storage Queue in Azure Portal
    • Look for “Queues” in your Storage Account and click on it
    • Click on + Queue in the menu bar to the top
    • Provide a useful name (eg. “claudy-snowflake-ingestion-queue-v1”)
    • Click OK
  5. Create an Event Grid Subscription

    create_event_subscription
    Creating an Event Grid Subscription in Azure Portal
    • Go back to the storage account and look for “Events” in the services list on the left-hand side
    • Click on + Event Subscription in the menu bar to the top
    • Provide a useful name (eg. “claudy-snowflake-ingestion-event-v1”)
    • If you create this for the first time in your storage account, you have to provide a ‘System Topic Name’ – provide a useful name here (eg. “claudy-snowflake-ingestion-topic”)
    • Under Endpoint Details, select “Storage Queues” from the Endpoint Type dropdown and then select the Storage Queue you created in the previous step as the endpoint
    • Leave the rest as is. Your final page should look something like this:

      create_event_final_page
      Final Page of Creating Event Subscription
    • Click Create
    • When you create this for the first time, it could happen that you run into the following error message:

      azure_event_error_message_resource_provisioning
      Azure Event Error Message Resource Provisioning
    • In that case, follow the few steps explained in this article to solve the problem
    • If you cannot read the full article, open Azure Portal Home in a new tab > Subscriptions > Select the Subscription you use for this Snowflake project > Select ‘Resource Providers’ from the list of options on the left side > look for “Microsoft.EventGrid” in the search bar > select the service > click on Register in the menu bar to the top > wait for the resource to be registered
    • After that, go back to your Event Grid Subscription and click on Create again

Create Integrations in Snowflake and Grant the Necessary Roles in Azure

First, let’s create the STORAGE INTEGRATION in Snowflake.

We will use the CREATE STORAGE INTEGRATION statement in Snowflake (make sure you have selected a role with the necessary grants to create an integration. If you’re unsure, select the ACCOUNTADMIN role)

For that, you need two things from Azure:

  1. Azure Active Directory Tenant ID (you’ve got to have an Azure AD tenant created in Azure Portal that is used for authentication federation with Microsoft)
    • Go to your Azure Active Directory
    • Select the Azure AD you want to use for the Snowflake project
    • Copy and note the Tenant ID
  2. URL of your Azure Blob Container
    • Go to your Storage Account
    • Go to Containers
    • Click on the Container you created earlier
    • Click on Properties on the left-hand side
    • Copy and note the URL
  3. Then go to Snowflake and create a new SQL worksheet
  4. Execute the following command but be sure to replace the parameters with the values you noted from Azure Portal as well as to provide a useful name for your integration:

    CREATE STORAGE INTEGRATION <name_of_your_integration>
      TYPE = EXTERNAL_STAGE
      STORAGE_PROVIDER = 'AZURE'
      ENABLED = TRUE
      AZURE_TENANT_ID = '<tenant-id>'
      STORAGE_ALLOWED_LOCATIONS = ('azure://<account>.blob.core.windows.net/<container>/<path>/');
    • Be sure to enclose the values with the apostrophe “ ‘ “
    • On the STORAGE_ALLOWED_LOCATIONS, be sure to paste the URL without the “https://” after the “azure://” 
    • If you paste the location URL of the container without any further path, you will get access to all files uploaded to this container in the future (check out Snowflake’s documentation on allowed and blocked locations)
  5. After you created the storage integration successfully, run the command DESC STORAGE INTEGRATION <NAME_OF_YOUR_STORAGE_INTEGRATION>; and put the name of your storage integration
  6. From the output copy and note the values for the AZURE_CONSENT_URL and the AZURE_MULTI_TENANT_APP_NAME

    snowflake_desc_storage_integration
    Snowflake Describe Storage Integration Command
  7. Copy and paste the AZURE_CONSENT_URL into a new browser window and accept the permission request – be sure to check the box on “Consent on behalf of your organization”

    snowflake_storage_int_consent
    Snowflake Storage Integration Consent Page
  8. You will be forwarded to Snowflake’s official website, you can close this tab

Next, let’s grant the necessary roles to the Snowflake Storage Integration App on the Storage Account in Azure.

  1. Go to your Storage Account in Azure
  2. Look for “IAM” and click on Access Control (IAM)
  3. Click on the tab “Role assignments”
  4. Click on + Add and then on Add role assignment

    storage_int_role_assignment_azure
    Storage Integration Role Assignment in Azure Portal
  5. You have to at least grant the Storage Blob Data Reader role. But if you want to utilise Snowflake to regularly clean up your container by removing files, you also have to add the Storage Blob Data Contributor role.
    • Look for “Storage Blob Data Reader” by using the search bar and click on the role to highlight it

      storage_int_role_assignment_search
      Storage Integration Role Assignment – Search for „Storage Blob Data Reader“
    • Click on Next at the bottom
    • Select User, group, or service principal
    • Click on + Select members and look for the previously created Snowflake app by pasting the part before the underscore (“_”) of the previously noted AZURE_MULTI_TENANT_APP_NAME
    • Click on the app and click Select at the bottom

      storage_int_role_assignment_snowflake_app
      Storage Integration Role Assignment – Add Snowflake App
    • Click on Review + assign and then Review + assign again
    • OPTIONAL: In case you want to be able to remove files from the container from inside Snowflake, you have to repeat the above steps for the Storage Blob Data Contributor role.

Now it’s time to create the NOTIFICATION INTEGRATION in Snowflake that is utilized for triggering the Snowpipe ingestion job as soon as a new file gets uploaded to the Azure Blob Container.

We will use the CREATE NOTIFICATION INTEGRATION statement in Snowflake (make sure you have selected a role with the necessary grants to create an integration. If you’re unsure, select the ACCOUNTADMIN role)

For that, you need again two things from Azure:

  1. Azure Active Directory Tenant ID (same as before)
    • Go to your Azure Active Directory
    • Select the Azure AD you want to use for the Snowflake project
    • Copy and note the Tenant ID
  2. URL of the Azure Storage Queue
    • Go to your Storage Account
    • Go to Queues
    • Copy and note the Url value next to the Storage Queue you created earlier
  3. Then go back to your SQL worksheet in Snowflake and run the following command – be sure to replace the <queue_url> and the <tenant_id> with the respective values you noted in the step before as well as to provide a useful name

    CREATE NOTIFICATION INTEGRATION <name_of_your_integration>
      ENABLED = true
      TYPE = QUEUE
      NOTIFICATION_PROVIDER = AZURE_STORAGE_QUEUE
      AZURE_STORAGE_QUEUE_PRIMARY_URI = '<queue_URL>'
      AZURE_TENANT_ID = '<directory_ID>';
  4. After you created the NOTIFICATION INTEGRATION successfully, run the command DESC NOTIFICATION INTEGRATION <NAME_OF_YOUR_INTEGRATION>; and put the name of your notification integration
  5. From the output copy and note the values for the AZURE_CONSENT_URL and the AZURE_MULTI_TENANT_APP_NAME

    snowflake_desc_notif_int
    Snowflake Describe Notification Integration
  6. Copy and paste the AZURE_CONSENT_URL into a new browser window and accept the permission request – be sure to check the box on “Consent on behalf of your organization”

    snowflake_notif_int_consent
    Snowflake Notification Integration Consent Page
  7. You will be forwarded to Snowflake’s official website, you can close this tab

Next, let’s grant the necessary roles to the Snowflake Notification Integration App on the Storage Account in Azure.

  1. Go to your Storage Account in Azure
  2. Go to Queues and click on the queue you created earlier
  3. Click on Access Control (IAM) on the left side
  4. Click on the tab “Role assignments”
  5. Click on + Add and then on Add role assignment

    notif_int_role_assignment_azure
    Notification Integration Role Assignment in Azure Portal
  6. You have to grant the Storage Queue Data Contributor role to the Snowflake Notification Integration App you created before
    • Look for “Storage Queue Data Contributor” by using the search bar and click on the role to highlight it

      notif_int_role_assignment_search
      Notification Integration Role Assignment Search for „Storage Queue Data Contributor“
    • Click on Next at the bottom
    • Select User, group, or service principal
    • Click on + Select members and look for the previously created Snowflake app by pasting the part before the underscore (“_”) of the previously noted AZURE_MULTI_TENANT_APP_NAME
    • Click on the app and click Select at the bottom

      notif_int_role_assignment_snowflake_app
      Notification Integration Role Assignment Snowflake App
    • Click on Review + assign and then Review + assign again

Create Snowflake Stage and Verify Storage Integration

  1. Create a SNOWFLAKE STAGE
    • For this, we will use the CREATE STAGE command in Snowflake (make sure you have selected a role with the necessary grants to create an integration. If you’re unsure, select the ACCOUNTADMIN role)
    • Go back to your SQL worksheet in Snowflake and execute the following command

      USE SCHEMA <your_database>.<your_schema>;
      
      CREATE STAGE <name_of_your_stage>
        URL = 'azure://myaccount.blob.core.windows.net/mycontainer/load/files/'
        STORAGE_INTEGRATION = <name_of_your_storage_integration>;
    • replace <your_database> and <your_schema> with the respective locations where you want to ingest data later from the external stage.
    • Provide a useful name for the stage
    • Replace the URL with the URL of your container but be sure to keep the “azure://” in front instead of the “https://”
    • Lastly, provide the name of the storage integration you created earlier.
  2. Upload a test file to the blob container (eg. a .csv file with a few rows and columns of sample data) 
  3. Then run the following statement in Snowflake and replace the values with yours

    ls @”<DATABASE_NAME>.<SCHEMA_NAME>.<STAGE_NAME>;’
  4. If the storage integration and external stage work correctly, you should see the file you just uploaded to the Azure Blob Container in the Snowflake output

    snowflake_ls_stage_files
    Snowflake ‚ls‘ Command to List Files in External Stage

Create Snowpipe for Continuous Data Ingestion

  1. Create a Snowflake Table first where the data will be ingested
    • Go back to your SQL worksheet in Snowflake
    • Use the CREATE TABLE command to create a table with the same number of columns as the file you uploaded to the Azure Blob Container – this could look like the following

      CREATE OR REPLACE TABLE RAW.PUBLIC.USERS_TEST (
          ID STRING,
          FIRST_NAME STRING,
          LAST_NAME STRING
      );
  2. Create Snowpipe for Automatic Data Ingestion
    • For this, we will use the CREATE PIPE command in Snowflake
    • Go back to your SQL worksheet in Snowflake
    • Execute the following command and replace the parameters with your own respective values

      CREATE OR REPLACE PIPE <YOUR_PIPE_NAME>
        AUTO_INGEST = TRUE
        INTEGRATION = 'YOUR_NOTIFICATION_INTEGRATION'
        AS COPY INTO <YOUR_DB>.<YOUR_SCHEMA>.<YOUR_TARGET_TABLE>
        FROM @<YOUR_DB>.<YOUR_SCHEMA>.<YOUR_STAGE>
        FILE_FORMAT = (TYPE='CSV');
  3. Run ALTER PIPE <YOUR_DATABASE.YOUR_SCHEMA.YOUR_PIPE_NAME> REFRESH;to trigger the pipe again. This should ingest the data from the file in your Azure Blob Container into the target table that you have defined in Snowflake.
  4. Check if the data has been loaded by running SELECT * FROM <YOUR_DATABASE.YOUR_SCHEMA.YOUR_TARGET_TABLE>
  5. If the data has been loaded successfully, you should see the data in the results output in Snowflake

Conclusion

This concludes the 2023 step-by-step guide on how to set up your enterprise-ready continuous data ingestion pipeline with Azure Blob Containers and Snowflake. If you run into any issues or have questions concerning your data project, do not hesitate to reach out to us.

Related Posts