Digger's low-level CLI API

Digger's low-level CLI API

This was meant to be just another feature announcement post. Something along the lines of “you can now run Digger CLI on your laptop; here’s why this is cool” kind of post. But as I started writing it I realised that why we built this feature is just as important as the feature itself, if not more. Digger was a CLI from day 1; however, it was always meant to run within a CI system such as Github Actions or Gitlab Pipelines - so it didn’t have any commands at all!

But first, let’s get the feature announcement out of the way. Starting from v0.3.22, you can run digger plan and digger apply on your laptop! It then does exactly what you think it’d do: runs Terraform plan (or apply). There’s a twist though: this is a low-level CLI API, meant mostly for scripting rather than regular usage in the terminal. The CLI commands expect a lengthy list of arguments like pr-number and environment variables to be set. It is basically a CI job, but running on your laptop. It still talks to the orchestrator backend, and can report output as pull request comments. Why did we build it this way? Below I am going to outline some considerations.

Going multi-CI

Digger was started in early 2023 as a simple GitHub action without any backend. Then we added an orchestrator backend to enable parallelism, so that each project could run independently in its own job. But now that we had a backend, only supporting GitHub Actions almost seemed like an artificial limitation. What difference does it make if it’s Actions or Gitlab Pipelines or Bitbucket Pipelines or Jenkins or whatever other CI people might be using? The orchestrator backend triggers a bunch of jobs, each running Digger CLI that reads and writes files on a linux VM.

To become CI-agnostic, we needed the CLI to stop assuming that it’s always running in GitHub Actions. We actually had top-level CI detection in the main entry point for a while (link to PR), but that still meant that the CLI is aware of the CI environment it’s running in. That in turn meant that it won’t work in any CI other than the ones explicitly handled in code; not ideal.

CI ≠ VCS

It is natural to make an assumption that everyone who uses GitHub also uses Actions as their CI, and similar assumptions for GitLab and BitBucket. It is also wrong. Who uses Jenkins or CircleCI then? (I wrote on history of ci/cd tooling some time ago).

We’ve been getting user requests to support GitLab and BitBucket almost since inception, but to our surprise that did not always mean “Gitlab Pipelines” or “Bitbucket Pipelines”. Oftentimes people would use Gitlab or BitBucket as their VCS, but something else as their CI, for example Jenkins. There were even cases of Github + Gitlab setups.

If someone wanted Digger to support GitLab the VCS, they’d expect to see Digger commenting in the MR thread - even if their CI is not GitLab Pipelines. But if it were about GitLab Pipelines, they’d expect to see jobs triggered in GitLab, even if they use something else as their git hosting.

So CI and VCS are separate concerns, at least from Digger perspective. To support another CI system, Digger CLI agent needs to be CI-agnostic, and the orchestrator backend needs to be able to trigger jobs in that CI system. Whereas to support another VCS server, Digger backend needs to handle webhooks that are coming from it, and the CLI needs to be able to report plan / apply status into its pull request thread. Which leads me to the next point:

Reporting securely: CI as compute backend

If a product consists of a CLI and an orchestrator service, which of them is the “frontend” and which is the “backend”? Seems like a stupid question; obviously, the orchestrator is the backend - it is a backend service after all - and the CLI is the frontend. So this was the assumption we went with, until we realised that in Digger, the relationship is reversed: the CI job is more like a “backend”, and the orchestrator is more like a “frontend”.

One of the main reasons we started Digger and designed it as a CI orchestrator was security. To use say Terraform Cloud you’d need to grant a third-party server full access to your AWS account, and it will have access to all of the plan outputs, which may contain sensitive data. Re-using existing CI infrastructure like Digger does means that the orchestrator does not need such level of trust - all of the sensitive data never leaves your CI, the orchestrator just starts the jobs.

The roles are reversed compared to a typical CLI + API setup: Digger CLI that runs in your CI system has higher level of access than the orchestrator. The orchestrator can be viewed as a “frontend” for the CI’s “compute backend”, and CI jobs are somewhat similar to lambda functions that have secrets passed as environment variables.

This realisation led to a non-obvious decision to keep the comment reporting logic in the CLI even after we introduced the orchestrator backend. The orchestrator seems to be the natural place for it; but moving it there would also give the orchestrator access to plan outputs. We wanted to avoid this, and came up with a somewhat tricky approach:

  1. Orchestrator creates a “placeholder” comment in the pull request
  2. Orchestrator starts N CI jobs and passes the comment ID to each job
  3. Each job updates the comment with its plan output

This way, the potentially sensitive plan output is never shared with the orchestrator, which is no longer a single point of failure from the security point of view.

CLI API as a CI-agnostic baseline

Realising that the CLI is more of a backend rather than the frontend meant that it was no longer about typing commands by hand. The purpose of the CLI was to run in any CI system - Github Actions, Gitlab Pipelines, Jenkins even - and report potentially sensitive output to a VCS. Instead of CI-aware, VCS-agnostic it became CI-agnostic, VCS-aware.

That doesn’t mean that you can’t run it locally though. If the CLI is truly CI-agnostic, then it shouldn’t care where it runs as long as the right arguments and environment variables are supplied. This also results better testability: we’d no longer need to start an actual CI job to test new functionality, the exact same behaviour can be tested locally.

We started with the following 3 commands:

  • digger plan <project_name>
  • digger apply <project_name>
  • digger destroy <project_name>

Each command has the following arguments (command line or env vars):

  • reporter - where to report progress: github , gitlab , bitbucket or stdout (default)
  • pr-number - the PR number for reporting to VCS
  • comment-id - placeholder comment for reporting (created by the orchestrator)
  • actor - user ID / role who initiated the command (e.g. “digger plan” comment author)

There are also a few reporter-specific arguments, like github-token and repo-namespace specific to the github reporter.

So no CLI frontend?

In our “baseline” CLI, each of the commands expects a bunch of arguments / environment variables, which makes it less convenient to run by hand. This CLI is meant primarily for use within CI systems, reporting output to one of the major VCS providers. The runs are triggered either by PR events (open, push), or by comments appended to the pull request thread. In some sense the PR thread is quite similar to a CLI - you type commands and get output as a comment.

Running commands in the terminal can also be handy; but we believe it to be an entirely different use case. For example, testing changes before submitting a PR, without end users needing to have AWS access locally (issue). That however means that the actual job is going to run remotely in the CI, not on user’s laptop. So we’d need the CLI to communicate with a backend which will trigger the job, and the CLI would need to be authenticated and upload the local code somewhere for the job to pick it up.

The low-level Digger CLI API provides a common baseline for all execution environments, CI as well as local. It makes it relatively straightforward to integrate Digger with pretty much any CI provider, even exotic custom-built ones. As for user-facing CLI commands, they can be built on top by adding extra layers like auth.