Azure

ADX Follower Databases - sharing is caring

Christoph Thale|
#azure#adx#terraform

Do you want to build a so-called follower database for Azure Data Explorer? In this article, we will take a closer look at what exactly that is, at what point it’s a good idea and how we can cleanly deploy it with Terraform.

Be independent but now we follow

A follower database is a feature of Microsoft Azure for Azure Data Explorer (ADX) and can only be used in combination with one or more existing lead ADX databases. In this setup, a database from the lead cluster is connected to the follower cluster in such a way that the database is also available for querying in the follower cluster. It’s possible to follow an entire lead cluster with all its databases or just individual databases. The follower database provides completely or selectively chosen tables, functions, or materialized views in read-only mode; neither additional tables can be created nor existing ones edited within it. The only adjustable aspects are caching policies, principals, and permission assignments. Since the entire follower ADX cluster is an independent cluster, additional databases can naturally be created there, all of which can support read, write, update and delete functionalities.

Technically, the follower database uses the same underlying data source (Blob Storage) as the lead database. This means that data in the follower cluster’s follower database can be displayed within seconds to minutes of the data appearing in the lead cluster. The time until synchronization depends on the amount of data and the size of the lead database’s metadata.

Why do we need this?

The main reason for providing data from a lead cluster to a follower cluster is to share important data while also providing separate computing resources with the same data state. As hinted above, the data sources, lead and follower, remain permanently in sync, which means no one has to worry about it. This happens automatically. Therefore, separate teams, for example, the data science team, which should not execute analyses and algorithms on the production lead cluster, can do so on the follower cluster without jeopardizing the business. Since access to the follower cluster can be individually configured, multiple teams can work on it simultaneously.

Deploy it

To deploy all of this, we use Terraform along with the official Azure Terraform Provider (azurerm).

First, we define the provider specifications:

terraform {
  required_providers {
    azurerm = {
      source = "hashicorp/azurerm"
      version = "3.89.0"
    }
  }
}

provider "azurerm" {
  subscription_id = <subscription_id>
  tenant_id       = <tenant_id>
  client_id       = <client_id>
  client_secret   = <secret>
}

To attach any or all databases from the lead cluster to the follower cluster, we use the ARM template resource block. Here, we define our template schema with all the needed parameters:

resource "azurerm_resource_group_template_deployment" "arm_template_attachment" {
  name                = "arm_template_attachment_name"
  resource_group_name = "your-rg-name"
  deployment_mode     = "Incremental"
  template_content = 
    <<TEMPLATE
    {
        "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
        "contentVersion": "1.0.0.0",
        "parameters": {
            "followerClusterName": {
                "type": "string",
                "defaultValue": "your-follower-cluster-name",
                "metadata": {
                    "description": "Name of the cluster to which the database will be attached."
                }
            },
            "attachedDatabaseConfigurationsName": {
                "type": "string",
                "defaultValue": "your-follower-cluster-name",
                "metadata": {
                    "description": "The name of the attached database configurations object. The name can be any string that is unique at the cluster level. This is not the name of the database in the follower cluster."
                }
            },
            "leaderClusterResourceId": {
                "type": "string",
                "defaultValue": "/subscriptions/<sub_id>/resourceGroups/<rg-name-of-lead-cluster>/providers/Microsoft.Kusto/clusters/<name-of-lead-cluster>",
                "metadata": {
                    "description": "The resource ID of the leader cluster."
                }
            },
            "databaseName": {
                "type": "string",
                "defaultValue": "dachsadxdb",
                "metadata": {
                    "description": "The name of the database to follow (within Lead Cluster). You can follow all databases by using '*'."
                }
            },
            "defaultPrincipalsModificationKind": {
                "type": "string",
                "defaultValue": "Replace",
                "metadata": {
                    "description": "The default principal modification kind. Replace: Follower Cluster defines its own and no principal or users are migrated from the Lead Cluster to the Follower Cluster"
                }
            },
            "tablesToInclude": {
                "type": "array",
                "defaultValue": "*",
                "metadata": {
                    "description": "The list of tables to include. Not supported when following all databases."
                }
            },
            "tablesToExclude": {
                "type": "array",
                "defaultValue": [],
                "metadata": {
                  "description": "The list of tables to exclude. Not supported when following all databases."
                }
            },
            "externalTablesToExclude": {
                "type": "array",
                "defaultValue": ["*"],
                "metadata": {
                    "description": "The list of external tables to exclude. Not supported when following all databases."
                }
            },
            "materializedViewsToInclude": {
                "type": "array",
                "defaultValue": ["my-mv-01", "my-mv-02"],
                "metadata": {
                "description": "The list of materialized views to include. Not supported when following all databases."
                }
            },
            "materializedViewsToExclude": {
                "type": "array",
                "defaultValue": [],
                "metadata": {
                    "description": "The list of materialized views to exclude. Not supported when following all databases."
                }
            },
            "functionsToInclude": {
                "type": "array",
                "defaultValue": ["],
                "metadata": {
                    "description": "The list of functions to include."
                }
            },
            "functionsToExclude": {
                "type": "array",
                "defaultValue": [],
                "metadata": {
                    "description": "The list of functions to exclude."
                }
            },
            "location": {
                "type": "string",
                "defaultValue": "westeurope",
                "metadata": {
                    "description": "Location for all resources."
                }
            }
        },
        "variables": {},
        "resources": [
            {
                "name": "[concat(parameters('followerClusterName'), '/', parameters('attachedDatabaseConfigurationsName'))]",
                "type": "Microsoft.Kusto/clusters/attachedDatabaseConfigurations",
                "apiVersion": "2021-01-01",
                "location": "[parameters('location')]",
                "properties": {
                    "clusterResourceId": "[parameters('leaderClusterResourceId')]",
                    "defaultPrincipalsModificationKind": "[parameters('defaultPrincipalsModificationKind')]",
                    "databaseName": "[parameters('databaseName')]",
                    "tableLevelSharingProperties":{
                        "tablesToInclude": "[parameters('tablesToInclude')]",
                        "tablesToExclude": "[parameters('tablesToExclude')]",
                        "externalTablesToExclude": "[parameters('externalTablesToExclude')]",
                        "materializedViewsToInclude": "[parameters('materializedViewsToInclude')]",
                        "materializedViewsToExclude": "[parameters('materializedViewsToExclude')]",
                        "functionsToInclude": "[parameters('functionsToInclude')]",
                        "functionsToExclude": "[parameters('functionsToExclude')]"
                    }
                }
            }
        ]
    }
    TEMPLATE
}

The provision and completion of the individual parameters within the ARM template are relatively straightforward. The only variable that might raise a question is the:

  • defaultPrincipalsModificationKind
    • Union: Merge principals and permissions from the lead database with the ones created on the follower database.
    • Replace: No inheritance of principals from the lead database. New principals and permissions must be created for the attached database.
    • None: The follower database principals include only the principals of the original database with no other principals.

Generally, “Replace” should be used so that completely new access rights and principals can be defined on the follower database.

After this deployment has been rolled out via Terraform using the commands

terraform init 
terraform plan
terraform apply

you have created your first follower database. Congratulations!

Tear everything down

If your data science team conducts such intense analyses that costs skyrocket or your business goes downhill, then it’s time to shut down the follower database. Through Terraform, we can only destroy the ARM template itself, not the resource (Follower Database) that was deployed in Azure using it. Therefore, we utilize a different approach.

First we need to remove the connection between the follower database and the lead database.

This can be done using a PowerShell script.

First, we authenticate ourselves in PowerShell with a user against Azure who has the rights to remove the connection between the lead and follower. To do this, execute the following command:

Connect-AzAccount

Afterwards, execute the following .ps1 script:

$FollowerClustername = 'my-follower-cluster-name'
$FollowerClusterSubscriptionID = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx'
$FollowerResourceGroupName = 'rg-of-the-follower-cluster'
$DatabaseName = "db-name-in-follower-cluster"  ## Can be specific database name or * for all databases.

##Construct the Configuration name
$confignameraw = (Get-AzKustoAttachedDatabaseConfiguration -ClusterName $FollowerClustername -ResourceGroupName $FollowerResourceGroupName -SubscriptionId $FollowerClusterSubscriptionID) | Where-Object {$_.DatabaseName -eq $DatabaseName }
$configname =$confignameraw.Name.Split("/")[1]

Remove-AzKustoAttachedDatabaseConfiguration -ClusterName $FollowerClustername -Name $configname -ResourceGroupName $FollowerResourceGroupName -SubscriptionId $FollowerClusterSubscriptionID

Happy coding and I will see you in the next article.

Back to Blog