CI/CD pipeline¶
GitHub Actions builds every change and, once it lands on main, deploys it to
the hosts automatically. The manual just workflow stays as the
break-glass path. The pipeline runs the same nixos-rebuild, just unattended.
The split is deliberate:
- CI proves every host builds, on a throwaway runner that touches no host.
- CD rolls merged changes out, joining the tailnet and running the exact
same
nixos-rebuild switchyou would run by hand.
Flow¶
PR / push ──▶ CI: build each host's system closure (build only, no host touched)
└─ required checks gate the merge
merge to main ──▶ CD: join tailnet → nixos-rebuild switch on every host
CI: build check¶
.github/workflows/ci.yml, triggered on pull requests to main.
- One job per host builds
nixosConfigurations.<host>.config.system.build.toplevel. - It only builds: no activation, no tailnet, no secrets. A config that fails to evaluate or compile fails here, before it can reach a host.
yggdrasilandmidgardbuild onubuntu-latest;alfheimbuilds on a nativeubuntu-24.04-armrunner, so itsaarch64closure is built natively instead of emulated.- Encrypted secrets are not needed: sops files are ciphertext in the store and build fine (see Secrets).
Mark the three build … checks as required in branch protection so only
buildable configs can reach main.
CD: deploy on merge¶
.github/workflows/deploy.yml, triggered on push to main (a merge).
Each host is handled by a job that:
- Joins the tailnet as an ephemeral node tagged
tag:ci(via a Tailscale OAuth client) and waits until the target node is reachable before continuing. -
Loads the deploy key and runs, for that host:
nixos-rebuild switch --no-reexec --flake .#<host> --build-host <host> # the node itself --target-host <host> # the node itself --sudo
Because --build-host and --target-host are both the node, each host builds
itself, exactly like a manual deploy. The runner only evaluates the flake and
orchestrates, so there is no cross-architecture build problem (alfheim
compiles its own aarch64 closure) and no binary cache to maintain. A
concurrency group serializes deploys so two merges never race.
All three hosts are switched on every merge; an unaffected host simply re-activates the same generation, which is a fast no-op.
Prerequisites¶
| What | Where | Purpose |
|---|---|---|
DEPLOY_SSH_KEY |
GitHub Actions secret | Private key the runner uses to SSH in as poby |
TS_OAUTH_CLIENT_ID / TS_OAUTH_SECRET |
GitHub Actions secrets | Tailscale OAuth client that mints ephemeral tag:ci nodes |
| Deploy public key | modules/users.nix → poby authorized keys |
Authorizes the runner on every host |
tag:ci + ACL |
Tailscale admin | Declares the tag and grants tag:ci access to the hosts on port 22 |
The deploy key is just another entry in poby's authorizedKeys, and poby
already has passwordless sudo (security.sudo.wheelNeedsPassword = false), so
no extra host-side setup is required.
Bootstrap order
The runner can only log in once the hosts already trust the deploy key. The
first rollout of that key is manual: just switch each host,
then let CD take over. After that, merges deploy on their own.
Day to day¶
- Open a PR. CI builds all three hosts.
- Merge once the checks are green. CD switches every host within a couple of minutes.
-
Follow a run from the repository's Actions tab, or:
gh run watch <run-id> --exit-status
Caveats¶
- No automatic rollback. A merge
switchhas no magic rollback, and green CI proves a config builds, not that it runs. Keep an eye on the monitoring stack after a deploy, and roll back by hand if a service misbehaves. - Break-glass stays manual. If the pipeline is stuck or a host needs an
urgent fix, deploy directly with
just switchand roll back withsudo nixos-rebuild switch --rollback. - Keep changes flowing through PRs. A direct push to
mainskips CI; enforce the build checks with branch protection somainstays deployable.