GE CI/CD Pipeline — Enterprise-Grade Autonomous Quality System¶
This document describes the CI/CD pipeline that enforces code quality, security, and compliance across all Growing Europe repositories. It serves as a reference for internal teams, external stakeholders (GitLab, Anthropic), investors, and auditors.
Pipeline Overview¶
- 3-tier execution model — FAST (every push, ~2 min, 10 jobs), STANDARD (MR + main, ~5 min, 25 jobs), FULL (nightly + manual, all 31 jobs including mutation and E2E).
- Nightly FULL run at 02:00 CEST — all 31 jobs including mutation testing and E2E suite.
- Custom pre-built runner image (
ge-ci-runner:latest) — all tools pre-installed, zeropip installoverhead per job. - Self-hosted GitLab on k3s with 3 k8s-executor runners (
concurrent=10). - Real container builds via kaniko — images pushed to GitLab Container Registry, signed with cosign, SBOM attached via Syft (CycloneDX).
- Real ArgoCD deployment — API-triggered sync, no manual kubectl.
- All tools open source. All self-hosted. Zero data leaves EU infrastructure.
See Tiered Pipeline for a full breakdown of which jobs run in which tier.
Stage-by-Stage Detail¶
The table below covers all 31 jobs across all tiers. The Tier column indicates when each job runs: F = FAST (every push), S = STANDARD (MR + main), FULL (nightly + manual trigger only).
| # | Stage | Tier | Time | What It Does | Tools | Compliance |
|---|---|---|---|---|---|---|
| 1 | lint:python |
F | 9 s | Zero-tolerance Python linting with enterprise ruff.toml |
ruff | ISO 27001 A.8.28 |
| 2 | lint:secrets |
F | 14 s | Secret detection with 445-finding baseline. Blocks NEW secrets only | gitleaks | ISO 27001 A.8.28 |
| 3 | build:backend |
F | 9 s | Verifies 4 core Python modules import correctly | Python ast |
ISO 27001 A.8.25 |
| 4 | test:unit |
F | 11 s | 553 unit tests (217 audit + consolidation tests) | pytest | ISO 27001 A.8.25 |
| 5 | tdd:oracle-check |
F | 8 s | Verifies TDD tests do not import implementation (oracle independence) | grep + ast |
Anti-LLM Stage 2 |
| 6 | security:bandit |
F | 10 s | Python SAST with CRITICAL/HIGH severity gating | Bandit | ISO 27001 A.8.28 |
| 7 | security:semgrep |
F | 9 s | TypeScript SAST + 8 custom LLM anti-pattern rules | Semgrep | ISO 27001 A.8.28 |
| 8 | security:dependency-scan |
F | 11 s | Dependency vulnerability audit | safety | ISO 27001 A.8.30 |
| 9 | test:integration |
F | 10 s | Real PostgreSQL 15 + Redis 7 service containers | pytest + asyncpg | ISO 27001 A.8.25 |
| 10 | types:python |
S | 12 s | Strict type checking across all Python modules | pyright --strict | ISO 27001 A.8.28 |
| 11 | lint:deadcode |
S | 9 s | Dead code detection (knip for TS, vulture for Python) | knip, vulture | ISO 27001 A.8.28 |
| 12 | iac:checkov |
S | 15 s | IaC security scan across k8s manifests and Dockerfiles | Checkov | ISO 27001 A.8.9 |
| 13 | iac:kubesec |
S | 10 s | Kubernetes manifest security scoring | Kubesec | ISO 27001 A.8.9 |
| 14 | test:e2e |
S | ~60 s | Playwright E2E suite (4 parallel workers in CI) | Playwright | Anti-LLM Stage 9 |
| 15 | test:adversarial |
S | 10 s | AST forbidden-call scan + 100 random input fuzz testing | Python ast + random |
Anti-LLM Stage 8 |
| 16 | test:reconciliation |
S | 10 s | TDD suite vs post-implementation comparison | Custom | Anti-LLM Stage 6 |
| 17 | test:contract |
S | 8 s | API contract verification | OpenAPI | Anti-LLM Stage 9 |
| 18 | build:image |
S | ~90 s | Real container build via kaniko, pushed to GitLab Container Registry | kaniko | ISO 27001 A.8.25 |
| 19 | sign:image |
S | 8 s | cosign signing with SLSA Level 3 attestation | cosign | ISO 27001 A.8.30 |
| 20 | sbom:generate |
S | 12 s | CycloneDX SBOM generation, attached to image | Syft | ISO 27001 A.8.30 |
| 21 | deploy:staging |
S | ~30 s | ArgoCD API-triggered sync to staging namespace | ArgoCD | CC8.1 |
| 22 | verify:health |
S | 15 s | HTTP health checks, SSL verification, error rate < 0.1% | curl, custom | ISO 27001 A.8.25 |
| 23 | merge:gate |
S | ~20 s | Reads JUnit/SARIF artifacts, computes release readiness score | Custom + JUnit | CC8.1 |
| 24 | test:mutation |
FULL | ~5 min | Stryker incremental mutation (changed files only) for TS; mutmut for Python | Stryker, mutmut | Anti-LLM Stage 4 |
| 25 | mutation:typescript |
FULL | ~4 min | Full Stryker run across all TypeScript (nightly only) | Stryker | Anti-LLM Stage 4 |
| 26 | test:property:python |
FULL | ~2 min | Hypothesis property-based tests (max_examples=1000) | Hypothesis | Anti-LLM Stage 8 |
| 27 | test:property:typescript |
FULL | ~2 min | fast-check property-based tests (numRuns=1000) | fast-check | Anti-LLM Stage 8 |
| 28 | dast:zap |
FULL | ~5 min | OWASP ZAP baseline scan against staging | ZAP | ISO 27001 A.8.29 |
| 29 | dast:nuclei |
FULL | ~3 min | Nuclei template scan against staging | Nuclei | ISO 27001 A.8.29 |
| 30 | verify:ssot |
FULL | 15 s | OpenAPI spec drift, file allocation law, config hardcode scan | Custom scripts | Anti-LLM Stage 9 |
| 31 | deploy:production |
FULL/GATE | — | Manual approval required — human-in-the-loop | GitLab | EU AI Act |
FAST tier wall time: ~2 minutes. STANDARD tier: ~5 minutes. FULL tier (nightly): all 31 jobs.
What Makes This Pipeline Unique¶
1. Oracle Independence (Stage 5)¶
No other CI system verifies that tests do not import implementation code. This prevents the number one LLM failure mode: tests that pass because they validate the AI's own logic rather than the specification.
When an LLM writes both the code and the tests, it can produce tests that simply mirror the implementation. Oracle independence breaks this loop by ensuring TDD tests reference only the specification interface, never the internal implementation.
2. Mutation Testing as a Gate (Stage 10)¶
Mutation testing is provided by mutmut, built on the Stryker concept from Info Support (Veenendaal, Netherlands). The 80% kill threshold ensures tests actually catch real code changes — not just execute lines for coverage metrics.
Code coverage alone is insufficient. A test suite can achieve 100% line coverage while asserting nothing meaningful. Mutation testing injects deliberate faults (mutants) into the codebase and verifies the test suite detects them. An 80% kill rate means at least 4 out of every 5 injected bugs are caught.
3. LLM Anti-Pattern Rules (Stage 7)¶
Eight custom Semgrep rules target patterns that LLMs generate with high frequency:
eval()usage- Hardcoded secrets in source
- SQL injection via f-strings
- Bare
exceptclauses (swallowing errors) XADDwithoutMAXLEN(Redis stream memory leak)- Hardcoded Redis port
6379(GE uses6381) - Unvalidated external input passed to shell commands
- Overly broad file permissions
These rules encode hard-won operational learnings from running a 54-agent system in production.
4. Adversarial Fuzz Testing (Stage 11)¶
One hundred random inputs are fed to the condition evaluator to verify no unexpected exceptions propagate. This is combined with an AST scan for forbidden function calls (exec, eval, compile, __import__). The fuzz testing catches edge cases that unit tests — especially LLM-generated ones — systematically miss.
5. Policy-as-Code (OPA/Rego)¶
Three policy files enforce compliance as executable code:
security.rego— Maps to ISO 27001 Annex A security controlscompliance.rego— Enforces data residency, audit trail, and logging requirementsdeployment.rego— Gates production deployment criteria
Auditors can read and verify these policy files directly. There is no ambiguity between documented policy and enforced policy — they are the same artifact.
6. Zero allow_failure on Enforcement Stages¶
The following stages are fully blocking with allow_failure: false — they were previously soft-failures and have been hardened as of 2026-04-02:
types:python— pyright strict type checkinglint:deadcode— dead code detectioniac:checkov— IaC security scaniac:kubesec— Kubernetes manifest scoringtest:e2e— Playwright end-to-end suitemutation:typescript— full Stryker mutation runverify:health— post-deploy health checks
Every enforcement stage either passes or blocks the pipeline. There are no "informational" stages that let problems through silently. This eliminates the common antipattern of accumulating ignored warnings until they become systemic.
7. Custom Runner Image¶
All 15+ tools are pre-installed in ge-ci-runner:latest. This eliminates pip install overhead per job and keeps FAST tier runtime under 2 minutes. The image is rebuilt on a schedule and pinned to known-good tool versions.
8. Gitleaks Baseline¶
The repository has 445 historical findings that have been reviewed and baselined. Only genuinely new secrets block the pipeline. This eliminates false-positive fatigue — the single largest reason teams disable secret scanning.
Infrastructure¶
| Component | Detail |
|---|---|
| Platform | Self-hosted GitLab CE 18.8.2 on k3s v1.34.3 |
| Runners | 3 k8s-executor pods (concurrent=10, pull_policy: if-not-present) |
| Registry | GitLab Container Registry (2 pods) — real image builds via kaniko |
| Deployment | ArgoCD (API-triggered sync), no manual kubectl in pipeline |
| Container signing | cosign + Syft CycloneDX SBOM attached to every production image |
| Hardware | AMD Ryzen 9 7940HS, 16 cores, 60 GB RAM |
| Storage | Local-path provisioner, 100 GB Minio for artifacts |
| k8s security | securityContext applied to all 28 k8s containers |
| Network | All traffic on local k3s cluster; no external CI dependencies |
| Data residency | Netherlands; zero data leaves EU infrastructure |
Compliance Mapping¶
ISO 27001:2022¶
| Control | Title | How Enforced |
|---|---|---|
| A.8.25 | Secure development lifecycle | Full pipeline execution on every commit. No code merges without green pipeline. |
| A.8.28 | Secure coding | SAST via Bandit (Python) and Semgrep (TypeScript), ruff linting, mutation testing. |
| A.8.29 | Security testing in development and acceptance | Adversarial fuzz testing. DAST via OWASP ZAP and Nuclei (FULL tier, nightly). |
| A.8.30 | Outsourced development | Dependency vulnerability scanning via safety. SBOM generated per build via Syft (CycloneDX), attached to every container image. |
SOC 2 Type II¶
| Criteria | Title | How Enforced |
|---|---|---|
| CC8.1 | Change management | Pipeline results linked to every merge request. Full PR history. Manual production gate. |
EU AI Act¶
| Requirement | How Enforced |
|---|---|
| Transparency | Co-Authored-By headers on every AI-generated commit. Agent identity tracked in session records. |
| Human oversight | deploy:production stage requires manual trigger. No autonomous deployment to production. |
| Risk management | 13 automated quality gates prevent unreviewed code from reaching production. |
Anti-LLM Testing Stages¶
The pipeline includes five stages specifically designed to catch failure modes unique to LLM-generated code. These are labeled "Anti-LLM Stage" in the compliance column.
| Anti-LLM Stage | Pipeline Stage | Purpose |
|---|---|---|
| Stage 2 | tdd:oracle-check |
Prevents tests from importing implementation (oracle problem) |
| Stage 4 | test:mutation |
Ensures tests catch real defects, not just achieve coverage |
| Stage 6 | test:reconciliation |
Compares TDD-phase tests against post-implementation behavior |
| Stage 8 | test:adversarial |
Catches edge cases LLM-generated tests systematically miss |
| Stage 9 | test:contract |
Verifies API contracts match specification, not implementation |
These stages exist because LLM-generated code exhibits different failure patterns than human-written code. Traditional CI pipelines were designed for human developers. This pipeline was designed for autonomous AI agents writing production software.
Learnings¶
Twelve infrastructure pitfalls encountered during pipeline construction are documented in CI/CD Infrastructure Pitfalls. Key topics include:
- GitLab Container Registry DNS resolution inside k3s
- YAML multiline string parsing in
.gitlab-ci.yml - PEP 668 (externally-managed-environment) breaking
pip installin runners - TLS certificate chain validation for self-hosted registries
- Shell executor vs k8s executor trade-offs
- Runner registration token rotation
- Service container networking in k8s-executor mode
Evolution Path¶
The following enhancements are planned or in progress, ordered by priority:
| Enhancement | Tool | Status | Purpose |
|---|---|---|---|
| Multi-project queue | GitLab CI | Planned | Priority scheduling across repositories |
| Stagehand agentic E2E | Stagehand (Browserbase) | Planned | Natural language browser automation for third-party integrations |
| Chaos testing | Litmus | Planned | Weekly chaos engineering against core services |
| Agent-CI bridge | ge_orchestrator/ci_bridge.py |
Planned | Real-time agent results back to GitLab pipeline status |
| Policy verification | OPA/conftest | Planned | verify:policy job enforcing security.rego, compliance.rego |
| Kyverno admission policies | Kyverno | Planned | Require image signature + SBOM attestation for production deploy |
Summary¶
This pipeline enforces up to 31 automated quality gates before any code reaches production. The FAST tier runs in approximately 2 minutes on every push. The STANDARD tier (~5 minutes) runs on merge requests and main-branch commits. The FULL tier (all 31 jobs) runs nightly at 02:00 CEST and on manual trigger.
The pipeline uses only open-source tools, keeps all data within EU infrastructure, and maps directly to ISO 27001, SOC 2, and EU AI Act requirements. Container images are built by kaniko, signed by cosign, have CycloneDX SBOMs attached, and are deployed by ArgoCD — with no manual kubectl in the promotion path.
It was built specifically for a world where AI agents write production software. The anti-LLM testing stages address failure modes that do not exist in traditional human-only development workflows.