Azure locking support implementation.


Digger is an open-source alternative to Terraform Cloud. It makes it easy to run Terraform Plan and Apply in Github Actions.

Digger already had locking support on AWS (via DynamoDB) and on GCP (via Buckets). In our gcp support announcement, Azure support was requested by u/BraakOSRS - and now it’s here! PR just merged yesterday.

Features

  • Add ability to Lock & Unlock through Azure Storage Account tables
  • Support Shared Key authentication
  • Support Connection string authentication
  • Support Client secret authentication
  • Create table if it doesn't already exist
  • Normalize lock name to work with Digger's format: since Storage Account table doesn't accept # and / characters
  • Provide meaningful errors to user at every step

How to use

There is one mandatory environment variable the user will have to set, in order to use Azure based locks: DIGGER_AZURE_AUTH_METHOD, which can take one of the three values below:

  • SHARED_KEY
  • CONNECTION_STRING
  • CLIENT_SECRET

Then, depending on the value of DIGGER_AZURE_AUTH_METHOD, the user will have to set other environment variables.

  1. SHARED_KEY
  • DIGGER_AZURE_SA_NAME: Storage account name
  • DIGGER_AZURE_SHARED_KEY: shared key of the storage account
  1. CONNECTION_STRING
  • DIGGER_AZURE_CONNECTION_STRING: connection string
  1. CLIENT_SECRET
  • DIGGER_AZURE_TENANT_ID: tenant id to use
  • DIGGER_AZURE_CLIENT_ID: client id of the service principal
  • DIGGER_AZURE_CLIENT_SECRET: secret of the service principal
  • DIGGER_AZURE_SA_NAME: storage account name

Why tables?

Distributed locking can be implemented in a number of ways. On the surface it seems to make sense to keep implementation consistent across cloud providers. On AWS we are using DynamoDB; so on Azure it must be Cosmo DB right? We however found that simply replicating approach from one cloud provider to another does not make much sense. On GCP we went with Buckets, mainly because it is the simplest and cheapest way to achieve what locks are for because they are strongly consistent on updates.

On Azure, we picked Storage Tables allows us to be scalable and store structured data - but the volume of data is next to nothing so it’s effectively free. We can perform  basic locking with resource ID and that’s all we need; similar approach to Buckets in GCP. We can also introduce additional fields if needed that can be stored in the backend; this is more flexible than using storage buckets.

How it works

Digger is written in Go so the locking mechanism on Azure implements the same Lock interface as its AWS and GCP counterparts:

Lock(lockId int, prNumber string) (bool, error)

Unlock(prNumber string) (bool, error)

GetLock(prNumber string) (*int, error)

Lock() acquires a lock with lockId for a specific pull request. On Azure this will create a record in the table and store lockID ID as a column. Before creating the record it checks if a lock is already existing for a specific PR. If it is already locked by the current PR then no action is performed. If it is locked by another PR then it will fail.

Unlock() will release the the lock which was aquired by a specified PR. On Azure this will delete the record from the table if it already exists. If it does not exist no action is performed.

GetLock() retrieves the lock which was aquired by the PR, if it exists. On Azure this will retrieve the record for prNumber if it exists, otherwise it will return nil.

Motivation behind Digger

We are often asked what benefit does Digger provide compared to simply running terraform plan / apply in an action. The short answer is, if this is enough for your use case then using a specialised tool indeed would be an overkill. But quite often this is not enough.

Terraform being stateful means that each plan / apply run needs to be aware of the state and behave differently. Race conditions against the same state can wreak havoc; but one run at a time repo-wide is impractical too. To make matters worse, code alone does not contain enough information to decide whether to run or to wait - because same terraform code can have multiple "instances" in different environments (just different tfvars). This means that there needs to be some sort of orchestration that is aware of the state.