2020-08-27 11:21:35 +02:00
- [Overview ](#overview )
2023-09-11 11:49:14 +02:00
- [Buildkite agents ](#buildkite-agents )
- [Build steps ](#build-steps )
2020-08-27 11:21:35 +02:00
- [Phabricator integration ](#phabricator-integration )
2023-09-11 11:49:14 +02:00
- [Life of a pre-merge check ](#life-of-a-pre-merge-check )
- [Cluster parts ](#cluster-parts )
- [Ingress and public addresses ](#ingress-and-public-addresses )
- [Enabled projects and project detection ](#enabled-projects-and-project-detection )
- [Agent machines ](#agent-machines )
- [Compilation caching ](#compilation-caching )
2020-08-27 11:21:35 +02:00
- [Buildkite monitoring ](#buildkite-monitoring )
# Overview
2020-08-27 15:24:16 +02:00
- [Buildkite ](https://buildkite.com/llvm-project ) orchestrates each build.
- multiple Linux and windows agents connected to Buildkite. Agents are run at
Google Cloud Platform.
- [small proxy service ](/phabricator-proxy ) that takes build requests from
[reviews.llvm.org ](http://reviews.llvm.org ) and converts them into Buildkite
build request. Buildkite job sends build results directly to Phabricator.
- every review creates a new branch in [fork of
llvm-project](https://github.com/llvm-premerge-tests/llvm-project).
2020-08-27 11:21:35 +02:00
2020-12-09 17:23:01 +01:00
![deployment diagram ](http://www.plantuml.com/plantuml/proxy?src=https://raw.githubusercontent.com/google/llvm-premerge-checks/main/docs/deployment.plantuml )
2020-08-27 11:21:35 +02:00
2023-09-11 11:49:14 +02:00
# Buildkite agents
Agents are deployed in two clusters `llvm-premerge-checks` and `windows-cluster` .
The latter is for windows machines (as it is controlled on cluster level if
machines can run windows containers).
Container configurations are in ./containers and deployment configurations are
in ./kubernetes. Most important ones are:
- Windows agents: container `containers/buildkite-windows` , deployment `kubernetes/buildkite/windows.yaml` . TODO: at the moment Docker image is created and uploaded
from a windows machine (e.g. win-dev). It would be great to setup a cloudbuild.
- Linux agents: run tests for linux x86 config, container `containers/buildkite-linux` , deployment `kubernetes/buildkite/linux.yaml` .
- Service agents: run low CPU build steps (e.g. generate pipeline steps) container `containers/buildkite-linux` , deployment `kubernetes/buildkite/service.yaml` .
All deployments have a copy `..-test` to be used as a test playground to check
container / deployment changes before modifying "prod" setup.
# Build steps
Buildkite allows [dynamically define pipelines as the output of a
command](https://buildkite.com/docs/pipelines/defining-steps#dynamic-pipelines).
And most of pipelines use this pattern of running a script and using the
resulting yaml. E.g. script to run pull-request checks is llvm-project [.ci/generate-buildkite-pipeline-premerge ](https://github.com/llvm/llvm-project/blob/main/.ci/generate-buildkite-pipeline-premerge ). Thus any changes to steps to run should
go into that script.
We have a legacy set of scripts in `/scripts` in this repo but discourage new
use and development of them - they are mostly kept to make Phabricator integration
to function.
2020-08-27 11:21:35 +02:00
# Phabricator integration
2023-09-11 11:49:14 +02:00
Note: this section is about integrating with Phabricator that is now discouraged,
some things might already be renamed or straight broken as we moving to Pull Requests.
2020-08-27 15:24:16 +02:00
- [Harbormaster build plan ](https://reviews.llvm.org/harbormaster/plan/5 ) the
Phabricator side these things were configured
- Herald [rule for everyone ](https://reviews.llvm.org/H576 ) and for [beta
testers](https://reviews.llvm.org/H511). Note that right now there is no
difference between beta and "normal" builds.
2023-03-28 13:35:04 +02:00
2020-08-27 15:24:16 +02:00
- the [merge_guards_bot user ](https://reviews.llvm.org/p/merge_guards_bot/ )
account for writing comments.
2020-08-27 11:21:35 +02:00
2023-09-11 11:49:14 +02:00
## Life of a pre-merge check
2020-08-27 11:21:35 +02:00
2020-08-27 15:24:16 +02:00
When new diff arrives for review it triggers a Herald rule ("everyone" or "beta
testers").
That in sends an HTTP POST request to [**phab-proxy** ](../phabricator-proxy )
that submits a new buildkite job **diff-checks** . All parameters from the
original request are put in the build's environment with `ph_` prefix (to avoid
2020-09-29 17:49:38 +02:00
shadowing any Buildkite environment variable). "ph_scripts_refspec" parameter
2020-12-09 17:23:01 +01:00
defines refspec of llvm-premerge-checks to use ("main" by default).
2020-08-27 15:24:16 +02:00
**diff-checks** pipeline
2020-10-02 14:04:29 +02:00
([create_branch_pipeline.py](../scripts/create_branch_pipeline.py))
2020-08-27 15:24:16 +02:00
downloads a patch (or series of patches) and applies it to a fork of the
llvm-project repository. Then it pushes a new state as a new branch (e.g.
"phab-diff-288211") and triggers "premerge-checks" on it (all "ph_" env
variables are passed to it). This new branch can now be used to reproduce the
build or by another tooling. Periodical **cleanup-branches** pipeline deletes
branches older than 30 days.
**premerge-checks** pipeline
2020-10-02 14:04:29 +02:00
([build_branch_pipeline.py](../scripts/build_branch_pipeline.py))
2020-08-27 15:24:16 +02:00
builds and tests changes on Linux and Windows agents. Then it uploads a
combined result to Phabricator.
2020-08-27 11:21:35 +02:00
2023-09-11 11:49:14 +02:00
## Cluster parts
2020-08-28 13:08:40 +02:00
2023-09-11 11:49:14 +02:00
### Ingress and public addresses
2020-08-28 13:08:40 +02:00
2020-09-29 11:44:12 +02:00
We use NGINX ingress for Kubernetes. Right now it's only used to provide basic
HTTP authentication and forwards all requests from load balancer to
[phabricator proxy ](../phabricator-proxy ) application.
2020-08-28 13:08:40 +02:00
Follow up to date docs to install [reverse
2020-09-29 11:44:12 +02:00
proxy](https://kubernetes.github.io/ingress-nginx/deploy/#gce-gke).
2023-03-28 13:35:04 +02:00
[cert-manager] is installed with helm https://cert-manager.io/docs/installation/helm/
2022-09-23 10:40:52 +02:00
helm install \
cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--version v1.9.1 \
2023-03-28 13:35:04 +02:00
--set installCRDs=true
We also have [certificate manager ](https://cert-manager.io/docs/ ) and
2020-09-29 11:44:12 +02:00
[lets-encrypt configuration ](../kubernetes/cert-issuer.yaml ) in place, but they are
not used at the moment and should be removed if we decide to live with static IP.
2023-03-28 13:35:04 +02:00
HTTP auth is configured with k8s secret 'http-auth' in 'buildkite' namespace
2020-09-29 11:44:12 +02:00
(see [how to update auth ](playbooks.md#update-http-auth-credentials )).
2020-08-28 13:08:40 +02:00
2023-09-11 11:49:14 +02:00
## Enabled projects and project detection
2020-08-27 11:21:35 +02:00
2020-08-27 15:24:16 +02:00
To reduce build times and mask unrelated problems, we're only building and
testing the projects that were modified by a patch.
[choose_projects.py ](../scripts/choose_projects.py ) uses manually maintained
[config file ](../scripts/llvm-dependencies.yaml ) to define inter-project
dependencies and exclude projects:
2020-08-27 11:21:35 +02:00
1. Get prefix (e.g. "llvm", "clang") of all paths modified by a patch.
2020-08-27 15:24:16 +02:00
2020-08-27 11:21:35 +02:00
1. Add all dependant projects.
2020-08-27 15:24:16 +02:00
1. Add all projects that this extended list depends on, completing the
dependency subtree.
2020-08-27 11:21:35 +02:00
1. Remove all disabled projects.
2023-09-11 11:49:14 +02:00
## Agent machines
2020-08-27 11:21:35 +02:00
2020-08-27 15:24:16 +02:00
All build machines are running from Docker containers so that they can be
debugged, updated, and scaled easily:
- [Linux ](../containers/buildkite-premerge-debian/Dockerfile ). We use
[Kubernetes deployment ](../kubernetes/buildkite ) to manage these agents.
2023-03-28 13:35:04 +02:00
- [Windows ](../containers/agent-windows-buildkite/Dockerfile ). At the moment they are run as
2020-08-27 15:24:16 +02:00
multiple individual VM instances.
2023-03-28 13:35:04 +02:00
2020-08-27 11:21:35 +02:00
See [playbooks ](playbooks.md ) how to manage and set up machines.
2023-09-11 11:49:14 +02:00
## Compilation caching
2020-08-27 11:21:35 +02:00
2020-08-27 15:24:16 +02:00
Each build is performed on a clean copy of the git repository. To speed up the
builds [ccache ](https://ccache.dev/ ) is used on Linux and
[sccache ](https://github.com/mozilla/sccache ) on Windows.
2020-08-27 11:21:35 +02:00
# Buildkite monitoring
2023-09-11 11:49:14 +02:00
FIXME: does not work as of 2023-09-11. Those metrics could allow
us to setup auto-scaling of machines to the current demend.
2020-08-27 11:21:35 +02:00
VM instance `buildkite-monitoring` exposes Buildkite metrics to GCP.
To set up a new instance:
2020-08-27 15:24:16 +02:00
2020-08-27 11:21:35 +02:00
1. Create as small Linux VM with full access to *Stackdriver Monitoring API* .
2020-08-27 15:24:16 +02:00
1. Follow instructions to [install monitoring
agent](https://cloud.google.com/monitoring/agent/install-agent) and [enable
statsd plugin](https://cloud.google.com/monitoring/agent/plugins/statsd).
1. Download recent release of
[buildkite-agent-metrics ](https://github.com/buildkite/buildkite-agent-metrics/releases ).
1. Run in SSH session:
2020-08-27 11:21:35 +02:00
```bash
chmod +x buildkite-agent-metrics-linux-amd64
nohup ./buildkite-agent-metrics-linux-amd64 -token XXXX -interval 30s -backend statsd &
```
2020-08-27 14:44:56 +02:00
Metrics are exported as "custom/statsd/gauge".
TODO: update "Testing scripts locally" playbook on how to run Linux build locally with Docker.
TODO: migrate 'builkite-monitoring' to k8s deployment.