top of page

Mastering Azure Synapse Analytics CI/CD: A Comprehensive Guide for Seamless Deployments

Writer's picture: Data Insight NestData Insight Nest

Updated: Jun 10, 2024

In the evolving landscape of data engineering and analytics, Azure Synapse Analytics stands out as a powerful tool for integrating big data and data warehousing. Implementing Continuous Integration and Continuous Deployment (CI/CD) in Synapse Analytics can significantly streamline your workflow, ensuring rapid and reliable delivery of analytics solutions. This guide walks you through the new CI/CD process in Synapse Analytics, which eliminates the need for manual publishing to create ARM templates. All the code and configurations used in this guide are available in my GitHub repository.

Flow comparison of old and new cicd methodology for Azure Synapse Analytics
 

Prerequisites

Before diving into the CI/CD setup for Synapse Analytics, ensure you have the following:


  1. Azure DevOps Account: If you don’t have one, create an Azure DevOps account.

  2. Azure Synapse Workspace: An existing Synapse workspace with some resources like pipelines, datasets, etc.

  3. Git Repository: Your Synapse workspace linked to a Git repository (Azure Repos or GitHub).

 

Step-by-Step Guide

1. Link Azure Synapse Workspace to a Git Repository

To enable version control and CI/CD, link your Synapse workspace to a Git repository.


  1. Navigate to your Synapse workspace in the Azure portal.

  2. Select 'Manage' and then 'Git Configuration'.

  3. Configure the Git settings by linking your Azure DevOps or GitHub repository.

Synapse Analytics Workspace user interface with steps to set up Git repository highlighted

2. Set Up CI/CD in Azure DevOps

a. Create a New Pipeline (CI):

  1. Go to your Azure DevOps project.

  2. Under Pipelines, select 'New Pipeline'.

  3. Choose the repository that contains your Synapse Analytics project.

Suggestion

Split your repository elements like:

Root

-> Synapse Analytics Workspace

-> Pipeline

b. Configure YAML Pipeline:

trigger:
- main

pool:
  vmImage: windows-latest

variables:
  - name: artifactsFolder
    value: './Synapse Analytics Workspace'
  - name: serviceConnectionName
    value: 'serviceConnectionName'
  - name: targetWorkspaceName
    value: 'targetWorkspaceName'
  - name: resourceGroupName
    value: 'resourceGroupName'
  - name: publishLocation
    value: 'main'

# This will increment by 1 each time this pipeline is run. I.e., the first run will be 0.0.0 and the second will be 0.0.1
# If versionNumber is updated to be 0.1, the counter will restart at 0
  - name: versionNumber
    value: '0.0'
  - name: revisionNumber
    value: $[counter(variables['versionNumber'],0)]

name: $(versionNumber).$(revisionNumber)

stages:
    # See considerations in the blog post
    #variables:
    #  - group: 'Variable Group Name'
  - stage: Validate
    displayName: 'Validate and Publish Artifact'
    jobs:
      - job: PublishArtifact
        displayName: 'Publish Artifact'
        steps:
          - task: Synapse workspace deployment@2
            displayName: 'Validate Synapse Workspace'
            inputs:
              operation: 'validate'
              ArtifactsFolder: $(artifactsFolder)
              azureSubscription: $(serviceConnectionName)
              TargetWorkspaceName: $(targetWorkspaceName)
          - task: CopyFiles@2
            displayName: 'Copy Files to:  $(Build.ArtifactStagingDirectory)'
            inputs:
              Contents: |
                ExportedArtifacts/*
                TemplateParametersForWorkspace*.json
              flattenFolders: true
              TargetFolder: '$(Build.ArtifactStagingDirectory)'
          - publish: '$(Build.ArtifactStagingDirectory)'
            displayName: 'Publish Pipeline Artifact'
            artifact: $(publishLocation)
  - stage: Deploy
    # See considerations in the blog post
    #variables:
    #  - group: 'Variable Group Name'
    displayName: 'Publish to Dev Workspace'
    jobs:
      - deployment: DeployToDev
        displayName: 'Publish To Dev'
        environment: Development
        strategy:
         runOnce:
           deploy:
            steps:
              # See considerations in the blog post
              #- template: 'Templates/publish-template.yml'
              #  parameters:
              #    ServiceConnection: $(ServiceConnectionTest)
              #    ResourceGroupName: $(Resource Group Name)
              #    WorkspaceName: $(Workspace Name)
              #    OverrideParameters: $(Override Parameters)
              #    ArtifactDirectory: '$(Pipeline.Workspace)\$(publishLocation)'
              #    Environment: $(Environment)
              - task: toggle-triggers-dev@2
                displayName: 'Stop Triggers'
                inputs:
                  azureSubscription: $(serviceConnectionName)
                  ResourceGroupName: $(resourceGroupName)
                  WorkspaceName: $(targetWorkspaceName)
                  ToggleOn: false
                  Triggers: '*'
              - task: Synapse workspace deployment@2
                displayName: 'Publish Synapse Artifact (Dev)'
                inputs:
                  TemplateFile: $(Pipeline.Workspace)/$(publishLocation)/TemplateForWorkspace.json
                  ParametersFile: $(Pipeline.Workspace)/$(publishLocation)/TemplateParametersForWorkspace.json
                  azureSubscription: $(serviceConnectionName)
                  ResourceGroupName: $(resourceGroupName)
                  TargetWorkspaceName: $(targetWorkspaceName)
                  DeleteArtifactsNotInTemplate: true
              - task: toggle-triggers-dev@2
                displayName: 'Restart Triggers'
                condition: always()
                inputs:
                  azureSubscription: $(serviceConnectionName)
                  ResourceGroupName: $(resourceGroupName)
                  WorkspaceName: $(targetWorkspaceName)
                  ToggleOn: false
                  Triggers: '*'

c. Create a New Pipeline (CD)

Repeat Step 2a


d. Configure YAML Pipeline:

trigger:
- none

pool:
  vmImage: windows-latest

variables:
  - name: serviceConnectionName
    value: 'serviceConnectionName'
  - name: resourceGroupName
    value: 'resourceGroupName'
  - name: targetWorkspaceName
    value: 'targetWorkspaceName'
  - name: artifactLocation
    value: 'artifactLocation' # Make sure this matches resourse pipeline name, and the name of the artifact you created in CI

resources:
  pipelines:
  - pipeline: artifactLocation
    source: Synapse-Analytics-CI

# Emo this does not work. We work around it in Setup stage.
# name: $(Resources.Pipeline.artifactLocation.runName) 

stages:
  - stage: Setup
    jobs:
      - job: Setup
        steps:
          - pwsh: |
              Write-Host "##vso[build.updatebuildnumber]$(Resources.Pipeline.artifactLocation.runName)"
            displayName: 'Set Pipeline Name to Build Number'
  - stage: Deploy_Test
    # See considerations in the blog post
    #variables:
    #  - group: 'Variable Group Name'
    jobs:
      - deployment: PublishToTest
        displayName: 'Publish To Test'
        continueOnError: false
        environment: Test
        strategy:
          runOnce:
            deploy:
              steps:
                # See considerations in the blog post
                #- template: 'Templates/publish-template.yml'
                #  parameters:
                #    ServiceConnection: $(ServiceConnectionTest)
                #    ResourceGroupName: $(Resource Group Name)
                #    WorkspaceName: $(Workspace Name)
                #    OverrideParameters: $(Override Parameters)
                #    ArtifactDirectory: '$(Pipeline.Workspace)\$(artifactLocation)\$(artifactLocation)'
                #    Environment: $(Environment)
              - task: toggle-triggers-dev@2
                displayName: 'Disable All Pipeline Triggers'
                inputs:
                  azureSubscription: $(serviceConnectionName)
                  ResourceGroupName: $(resourceGroupName)
                  WorkspaceName: $(targetWorkspaceName)
                  ToggleOn: false
                  Triggers: '*'
              - task: Synapse workspace deployment@2
                displayName: 'Publish Synapse Artifact (Test)'
                inputs:
                  operation: 'deploy'
                  TemplateFile: $(Pipeline.Workspace)\$(artifactLocation)\$(artifactLocation)\TemplateForWorkspace.json
                  ParametersFile: $(Pipeline.Workspace)\$(artifactLocation)\$(artifactLocation)\TemplateParametersForWorkspace.json
                  azureSubscription: $(serviceConnectionName)
                  ResourceGroupName: $(resourceGroupName)
                  TargetWorkspaceName: $(targetWorkspaceName)
                  DeleteArtifactsNotInTemplate: true
                  DeployManagedPrivateEndpoints: true
                  Environment: 'prod'
              - task: toggle-triggers-dev@2
                displayName: 'Restart triggers'
                condition: always()
                inputs:
                  azureSubscription: $(serviceConnectionName)
                  ResourceGroupName: $(resourceGroupName)
                  WorkspaceName: $(targetWorkspaceName)
                  ToggleOn: true
                  Triggers: '*'

Considerations

Use ADO Environments

Azure DevOps Environments provide a way to manage deployments and monitor the release pipeline. By using environments, you can:


  • Isolate Deployments: Create separate environments for development, testing, and production to isolate different stages of your deployment pipeline.

  • Approval Gates: Implement approval gates to ensure deployments to critical environments (like production) are reviewed and approved by designated stakeholders.

  • Tracking and Auditing: Track deployment history and status for each environment, aiding in auditing and troubleshooting.


To set up environments, navigate to Azure DevOps, and under Pipelines, select Environments. Create new environments for each stage of your deployment pipeline and configure the necessary approvals and checks.

Use ADO Variable Groups

Apply Branch Policies on 'main'

Create YAML Templates to create Reusable Code

Conclusion

Implementing CI/CD for Synapse Analytics using Azure DevOps significantly enhances your data engineering workflow by automating deployments and ensuring consistency across environments. By following this guide, you eliminate the need for manual publishing, streamline the deployment process, and achieve a more efficient and reliable Synapse Analytics setup.

 
 

By implementing the steps outlined in this guide, you will be able to efficiently manage your Synapse Analytics deployments with a robust CI/CD pipeline, ensuring high-quality and rapid delivery of your data solutions.


As always, I welcome your comments and discussions! Share your thoughts, experiences, and any questions you might have in the comments below. Let's collaborate and make the most out of these CI/CD capabilities for Synapse Analytics!

31 views0 comments

Comments


bottom of page