We had THE EXACT same question. We went and asked the r/terraform community this:
”Why are people using Terraform Cloud? I may be missing something, but why can't terraform just be run in GitHub Actions?
I feel that for 99% of companies, a terraform runner fundamentally only needs the following flow:
- Run terraform plan on every PR
- Run terraform apply on merge to master/main branch.
- Handle of concurrency by queuing multiple applies together.
- Terraform secrets can be handled using GitHub Secrets.”
We had also created and linked to a demo repository, to show how we “thought” it could be done. This article summarises our learnings from that post & adds a bit of context from articles and literature out there on why you just cannot reuse GitHub actions as is, without some additional GitOps handholding.
- State -We realised that while Terraform is similar to application code in several aspects, it is stateful. Terraform uses State to determine which changes to make to your infrastructure. GitHub actions or any other CI for that matter does not natively store state.
In the article “The Pains In Terraform Collaboration”, Yi Lu explains that - “a state file falls out of date frequently and needs refresh repeatedly. When sharing it remotely with a team, we have to handle racing condition, i.e. concurrent apply, or plan during apply. In addition, a state file contains sensitive data so we must protect access.”
The impact of running Terraform in GitHub Actions as is (ie without state awareness)can be disastrous! Spotify Accidentally Deleted all its k8s clusters because of state confusion!
2. Policy as Code & Single observability pane of glass. - Multiple people spoke about Hashicorp’s sentinel, which is it’s Policy as code framework for HashiCorp Enterprise Products.
A few others spoke about possibility of integration with OPA/OPAL. Infrastructure governance & policies for example using OPA , Kyverno, Checkov is what enables large teams to set up efficient self-service for Dev/QA teams, like creation and destruction of testing environments. This was also mentioned in this blog on Terraform Automation and collaboration software. (TACOs).
The self service aspect of this is why Single Observability pane of glass was combined with Policy-as-code in this point. Non infra savvy developers seemed to prefer UI based single pane of glass to see what was happening rather than seeing what is under the hood in the terraform codebase.
3. Integrations - The ability to integrate with other Terraform tooling - for example infracost for cloud cost estimates, pluralith for automated infrastructure documentation & visualization and driftctl to detect, track and alert on infrastructure drift out of the box, without actually having to write custom code was also a massive allure for people who reacted on the thread.
4. Configurability of workflows - specifically around the merge-apply dilemma, as coined by Yi Lu in the same article mentioned above - “Deciding the order of the two activities is subtle. Neither way feels ideal at a glance:
- merge-and-apply: if we merge the code first to main branch. The new commit on main branch will drive deployment. Given the stateful nature, we should assume a good chance of failure in the apply action. Therefore, we will have to go through the iterative journey along the main branch, via numerous PRs and merges. The main branch in this case is treated like a chatty scratchpad rather than a seriously guarded golden copy.
- apply-and-merge: if we apply the commits from feature branch first, we can take the iterative journey along the feature branch associated with the pull request, which will keep the main branch cleaner. The apply-and-merge approach handles the risks introduced by the stateful nature in an elegant way. However, now awaiting us in the next step is the risk of merge conflict. If merge problems occur, we’d have to go over PR again with unwanted apply actions.”
Having ability to decide which way you want to go ahead with seems super important. The divide was seen even in the reddit thread we posted. One reddit user said “I'm not a fan of allowing people to apply at the PR level. It's similar to deploying an app during a PR, instead of letting the pipeline do it's thing once you merge. I get that some applies don't go through all the way, but the instability in letting people apply willy nilly during the PR is not worth it.”
Another responded saying exactly the opposite! “Naa. You make a successful apply the condition of the the merge to master. Doesn't matter if you use GitHub actions, Atlantis, Azure DevOps, etc.. It ensures that your master branch never has broken code, And if I failed apply still changing infrastructure terraform drift is great for monitoring configurations drift of your master branch. THIS is the way.”
Both seemed SUPER opinionated, to customisation is the only way.
5. Private module registry - Terraform modules help you abstract away a lot of low-level stuff from your infrastructure. As mentioned in Hashicorp docs - Modules are the main way to package and reuse resource configurations with Terraform. It helps create reusable infrastructure and helps save a lot of dev time. This is obviously not available natively in GitHub actions, but most TACOS providers have a private module registry.
To summarise, the bottomline is that you can still use GH actions, but not out of the box. As one reddit user puts it so beautifully - “These problems individually are simple to solve: State management is one of the easiest things about Terraform; it's JSON in a bucket, Running a Terraform plan on every PR is about ten lines of YAML in GHA, Versioned modules with consistent results are possible via purely git and tag references, Secrets can be handled by any data source that decrypts a vault secret, Any CI system I can think of displays the status of the last run - use a pipeline per root module to isolate them, and use dependency management to handler ordering., Include OPA checks in your pipelines, in addition to whatever test is reasonable.
However - in a team setting this lacks Enterprise support (sometimes a hard requirement), Getting to a similar configuration as above very quickly and reliably without the expertise required to ideate or implement such a thing , Consistent state management across all cloud providers, with provisions for using workspaces & A predictable contract for usage that other systems can readily depend on.”
Plug - This article was written by the Team at Digger, which is an open-source alternative to Terraform Cloud. It makes it easy to run
terraform plan and
apply in the CI / CD platform you already have, such as Github Actions. Go check it out!