Is S3 + Dynamo enough for terraform state?
What is Terraform ?
Terraform is the most popular Infrastructure-as-code tool today. We rely on terraform to define our infrastructure and provision the resources needed to bring up our environments online. We also rely on it to manage changes to our infrastructure (terraform plan - terraform apply cycle).
Why does Terraform need state?
In order to function terraform needs to store its representation of the world in a statefile. The statefile may contain sensitive data and therefore needs to be protected. For an individual using terraform it is sufficient to store the statefile on the local machine.
How do you handle Terraform state in a team setting?
For team collaboration the statefile needs to be preserved at a remote location. This is because multiple individuals require access to the same statefile. In order to acheive this Terraform can be configured to leverage a remote backend for state storage. several remote backend options are supported.
With the state being stored also leads to another problem. Juan and Amy cannot run terraform at the same time since this leads to all kinds of issues. We want to gaurantee that if Amy is currently running a terraform apply and Juan attempts to run a terraform apply, Juan will receive an error. This problem is This can be solved by locking terraform plans and applies. No two plans or applies should be allowed in parallel. Terraform solves this issue using a “lock file”. This lock file also needs to be stored remotely in a team setting.
Terraform support several kinds of remote backends including S3+dynamo, http, remote backends etc. In addition to that several cloud offerings such as Terraform Cloud also handle the terraform state and locking functionalities for you transparently. In many cases they both run your terraform and handle the state transparently
The question is whether we need a dedicated cloud offering to handle our state for us? Is the S3 backend (with dynamo for lock) enough for an enterprise grade state offering? In order to answer this question, let us agree on what is needed from an enterprise grade offering for a state backend:
- Be easy to set up and maintain
- Needs to support encryption at rest and in transit. The statefile contains sensitive data such as database passwords and needs to be protected
- Needs to handle versioning of the statefile. In case something goes wrong and the statefile gets corrupted it should be possible to recover the most recent valid version
- In addition to storing state it needs to support reliable distributed locking. No two terraform tasks should be allowed to aquire a lock at the same time for running a plan or apply command
Let’s take a look at if the S3 backend achieves these tasks.
Easy to set up and maintain
A few lines of terraform to set up an S3 bucket and a dynamo table. The trick is making sure that the bucket is encrypted and versioned, and has the right policies to lock down the file. Both S3 and dynamo are managed solutions from AWS and do not need any management once set up.
Encryption at rest and in transit
If encryption is enabled the file will be encrypted at rest and also in transit through TLS.
Need to handle versions of statefile
If the bucket is versioned then every update to the statefile is tracked hence it can be restored from previous versions if something goes wrong.
Need to handle distributed locking
In this case locking can be handled by a dynamo table. It is a reliable way to implement consistent distributed locking.
One interesting fact is that spacelift is actually an S3 bucket under the hood, and I claim that alot of the other cloud offerings are the same.
In conclusion, based on the points above S3 is enough to securely and reliably handle the terraform statefile as well as perform locking on concurrent terraform executions. Was there a condition that I have missed? Let me know in the comments below.