Skip to content

GE CI/CD Pipeline — Enterprise-Grade Autonomous Quality System

This document describes the CI/CD pipeline that enforces code quality, security, and compliance across all Growing Europe repositories. It serves as a reference for internal teams, external stakeholders (GitLab, Anthropic), investors, and auditors.


Pipeline Overview

  • 3-tier execution model — FAST (every push, ~2 min, 10 jobs), STANDARD (MR + main, ~5 min, 25 jobs), FULL (nightly + manual, all 31 jobs including mutation and E2E).
  • Nightly FULL run at 02:00 CEST — all 31 jobs including mutation testing and E2E suite.
  • Custom pre-built runner image (ge-ci-runner:latest) — all tools pre-installed, zero pip install overhead per job.
  • Self-hosted GitLab on k3s with 3 k8s-executor runners (concurrent=10).
  • Real container builds via kaniko — images pushed to GitLab Container Registry, signed with cosign, SBOM attached via Syft (CycloneDX).
  • Real ArgoCD deployment — API-triggered sync, no manual kubectl.
  • All tools open source. All self-hosted. Zero data leaves EU infrastructure.

See Tiered Pipeline for a full breakdown of which jobs run in which tier.


Stage-by-Stage Detail

The table below covers all 31 jobs across all tiers. The Tier column indicates when each job runs: F = FAST (every push), S = STANDARD (MR + main), FULL (nightly + manual trigger only).

# Stage Tier Time What It Does Tools Compliance
1 lint:python F 9 s Zero-tolerance Python linting with enterprise ruff.toml ruff ISO 27001 A.8.28
2 lint:secrets F 14 s Secret detection with 445-finding baseline. Blocks NEW secrets only gitleaks ISO 27001 A.8.28
3 build:backend F 9 s Verifies 4 core Python modules import correctly Python ast ISO 27001 A.8.25
4 test:unit F 11 s 553 unit tests (217 audit + consolidation tests) pytest ISO 27001 A.8.25
5 tdd:oracle-check F 8 s Verifies TDD tests do not import implementation (oracle independence) grep + ast Anti-LLM Stage 2
6 security:bandit F 10 s Python SAST with CRITICAL/HIGH severity gating Bandit ISO 27001 A.8.28
7 security:semgrep F 9 s TypeScript SAST + 8 custom LLM anti-pattern rules Semgrep ISO 27001 A.8.28
8 security:dependency-scan F 11 s Dependency vulnerability audit safety ISO 27001 A.8.30
9 test:integration F 10 s Real PostgreSQL 15 + Redis 7 service containers pytest + asyncpg ISO 27001 A.8.25
10 types:python S 12 s Strict type checking across all Python modules pyright --strict ISO 27001 A.8.28
11 lint:deadcode S 9 s Dead code detection (knip for TS, vulture for Python) knip, vulture ISO 27001 A.8.28
12 iac:checkov S 15 s IaC security scan across k8s manifests and Dockerfiles Checkov ISO 27001 A.8.9
13 iac:kubesec S 10 s Kubernetes manifest security scoring Kubesec ISO 27001 A.8.9
14 test:e2e S ~60 s Playwright E2E suite (4 parallel workers in CI) Playwright Anti-LLM Stage 9
15 test:adversarial S 10 s AST forbidden-call scan + 100 random input fuzz testing Python ast + random Anti-LLM Stage 8
16 test:reconciliation S 10 s TDD suite vs post-implementation comparison Custom Anti-LLM Stage 6
17 test:contract S 8 s API contract verification OpenAPI Anti-LLM Stage 9
18 build:image S ~90 s Real container build via kaniko, pushed to GitLab Container Registry kaniko ISO 27001 A.8.25
19 sign:image S 8 s cosign signing with SLSA Level 3 attestation cosign ISO 27001 A.8.30
20 sbom:generate S 12 s CycloneDX SBOM generation, attached to image Syft ISO 27001 A.8.30
21 deploy:staging S ~30 s ArgoCD API-triggered sync to staging namespace ArgoCD CC8.1
22 verify:health S 15 s HTTP health checks, SSL verification, error rate < 0.1% curl, custom ISO 27001 A.8.25
23 merge:gate S ~20 s Reads JUnit/SARIF artifacts, computes release readiness score Custom + JUnit CC8.1
24 test:mutation FULL ~5 min Stryker incremental mutation (changed files only) for TS; mutmut for Python Stryker, mutmut Anti-LLM Stage 4
25 mutation:typescript FULL ~4 min Full Stryker run across all TypeScript (nightly only) Stryker Anti-LLM Stage 4
26 test:property:python FULL ~2 min Hypothesis property-based tests (max_examples=1000) Hypothesis Anti-LLM Stage 8
27 test:property:typescript FULL ~2 min fast-check property-based tests (numRuns=1000) fast-check Anti-LLM Stage 8
28 dast:zap FULL ~5 min OWASP ZAP baseline scan against staging ZAP ISO 27001 A.8.29
29 dast:nuclei FULL ~3 min Nuclei template scan against staging Nuclei ISO 27001 A.8.29
30 verify:ssot FULL 15 s OpenAPI spec drift, file allocation law, config hardcode scan Custom scripts Anti-LLM Stage 9
31 deploy:production FULL/GATE Manual approval required — human-in-the-loop GitLab EU AI Act

FAST tier wall time: ~2 minutes. STANDARD tier: ~5 minutes. FULL tier (nightly): all 31 jobs.


What Makes This Pipeline Unique

1. Oracle Independence (Stage 5)

No other CI system verifies that tests do not import implementation code. This prevents the number one LLM failure mode: tests that pass because they validate the AI's own logic rather than the specification.

When an LLM writes both the code and the tests, it can produce tests that simply mirror the implementation. Oracle independence breaks this loop by ensuring TDD tests reference only the specification interface, never the internal implementation.

2. Mutation Testing as a Gate (Stage 10)

Mutation testing is provided by mutmut, built on the Stryker concept from Info Support (Veenendaal, Netherlands). The 80% kill threshold ensures tests actually catch real code changes — not just execute lines for coverage metrics.

Code coverage alone is insufficient. A test suite can achieve 100% line coverage while asserting nothing meaningful. Mutation testing injects deliberate faults (mutants) into the codebase and verifies the test suite detects them. An 80% kill rate means at least 4 out of every 5 injected bugs are caught.

3. LLM Anti-Pattern Rules (Stage 7)

Eight custom Semgrep rules target patterns that LLMs generate with high frequency:

  • eval() usage
  • Hardcoded secrets in source
  • SQL injection via f-strings
  • Bare except clauses (swallowing errors)
  • XADD without MAXLEN (Redis stream memory leak)
  • Hardcoded Redis port 6379 (GE uses 6381)
  • Unvalidated external input passed to shell commands
  • Overly broad file permissions

These rules encode hard-won operational learnings from running a 54-agent system in production.

4. Adversarial Fuzz Testing (Stage 11)

One hundred random inputs are fed to the condition evaluator to verify no unexpected exceptions propagate. This is combined with an AST scan for forbidden function calls (exec, eval, compile, __import__). The fuzz testing catches edge cases that unit tests — especially LLM-generated ones — systematically miss.

5. Policy-as-Code (OPA/Rego)

Three policy files enforce compliance as executable code:

  • security.rego — Maps to ISO 27001 Annex A security controls
  • compliance.rego — Enforces data residency, audit trail, and logging requirements
  • deployment.rego — Gates production deployment criteria

Auditors can read and verify these policy files directly. There is no ambiguity between documented policy and enforced policy — they are the same artifact.

6. Zero allow_failure on Enforcement Stages

The following stages are fully blocking with allow_failure: false — they were previously soft-failures and have been hardened as of 2026-04-02:

  • types:python — pyright strict type checking
  • lint:deadcode — dead code detection
  • iac:checkov — IaC security scan
  • iac:kubesec — Kubernetes manifest scoring
  • test:e2e — Playwright end-to-end suite
  • mutation:typescript — full Stryker mutation run
  • verify:health — post-deploy health checks

Every enforcement stage either passes or blocks the pipeline. There are no "informational" stages that let problems through silently. This eliminates the common antipattern of accumulating ignored warnings until they become systemic.

7. Custom Runner Image

All 15+ tools are pre-installed in ge-ci-runner:latest. This eliminates pip install overhead per job and keeps FAST tier runtime under 2 minutes. The image is rebuilt on a schedule and pinned to known-good tool versions.

8. Gitleaks Baseline

The repository has 445 historical findings that have been reviewed and baselined. Only genuinely new secrets block the pipeline. This eliminates false-positive fatigue — the single largest reason teams disable secret scanning.


Infrastructure

Component Detail
Platform Self-hosted GitLab CE 18.8.2 on k3s v1.34.3
Runners 3 k8s-executor pods (concurrent=10, pull_policy: if-not-present)
Registry GitLab Container Registry (2 pods) — real image builds via kaniko
Deployment ArgoCD (API-triggered sync), no manual kubectl in pipeline
Container signing cosign + Syft CycloneDX SBOM attached to every production image
Hardware AMD Ryzen 9 7940HS, 16 cores, 60 GB RAM
Storage Local-path provisioner, 100 GB Minio for artifacts
k8s security securityContext applied to all 28 k8s containers
Network All traffic on local k3s cluster; no external CI dependencies
Data residency Netherlands; zero data leaves EU infrastructure

Compliance Mapping

ISO 27001:2022

Control Title How Enforced
A.8.25 Secure development lifecycle Full pipeline execution on every commit. No code merges without green pipeline.
A.8.28 Secure coding SAST via Bandit (Python) and Semgrep (TypeScript), ruff linting, mutation testing.
A.8.29 Security testing in development and acceptance Adversarial fuzz testing. DAST via OWASP ZAP and Nuclei (FULL tier, nightly).
A.8.30 Outsourced development Dependency vulnerability scanning via safety. SBOM generated per build via Syft (CycloneDX), attached to every container image.

SOC 2 Type II

Criteria Title How Enforced
CC8.1 Change management Pipeline results linked to every merge request. Full PR history. Manual production gate.

EU AI Act

Requirement How Enforced
Transparency Co-Authored-By headers on every AI-generated commit. Agent identity tracked in session records.
Human oversight deploy:production stage requires manual trigger. No autonomous deployment to production.
Risk management 13 automated quality gates prevent unreviewed code from reaching production.

Anti-LLM Testing Stages

The pipeline includes five stages specifically designed to catch failure modes unique to LLM-generated code. These are labeled "Anti-LLM Stage" in the compliance column.

Anti-LLM Stage Pipeline Stage Purpose
Stage 2 tdd:oracle-check Prevents tests from importing implementation (oracle problem)
Stage 4 test:mutation Ensures tests catch real defects, not just achieve coverage
Stage 6 test:reconciliation Compares TDD-phase tests against post-implementation behavior
Stage 8 test:adversarial Catches edge cases LLM-generated tests systematically miss
Stage 9 test:contract Verifies API contracts match specification, not implementation

These stages exist because LLM-generated code exhibits different failure patterns than human-written code. Traditional CI pipelines were designed for human developers. This pipeline was designed for autonomous AI agents writing production software.


Learnings

Twelve infrastructure pitfalls encountered during pipeline construction are documented in CI/CD Infrastructure Pitfalls. Key topics include:

  • GitLab Container Registry DNS resolution inside k3s
  • YAML multiline string parsing in .gitlab-ci.yml
  • PEP 668 (externally-managed-environment) breaking pip install in runners
  • TLS certificate chain validation for self-hosted registries
  • Shell executor vs k8s executor trade-offs
  • Runner registration token rotation
  • Service container networking in k8s-executor mode

Evolution Path

The following enhancements are planned or in progress, ordered by priority:

Enhancement Tool Status Purpose
Multi-project queue GitLab CI Planned Priority scheduling across repositories
Stagehand agentic E2E Stagehand (Browserbase) Planned Natural language browser automation for third-party integrations
Chaos testing Litmus Planned Weekly chaos engineering against core services
Agent-CI bridge ge_orchestrator/ci_bridge.py Planned Real-time agent results back to GitLab pipeline status
Policy verification OPA/conftest Planned verify:policy job enforcing security.rego, compliance.rego
Kyverno admission policies Kyverno Planned Require image signature + SBOM attestation for production deploy

Summary

This pipeline enforces up to 31 automated quality gates before any code reaches production. The FAST tier runs in approximately 2 minutes on every push. The STANDARD tier (~5 minutes) runs on merge requests and main-branch commits. The FULL tier (all 31 jobs) runs nightly at 02:00 CEST and on manual trigger.

The pipeline uses only open-source tools, keeps all data within EU infrastructure, and maps directly to ISO 27001, SOC 2, and EU AI Act requirements. Container images are built by kaniko, signed by cosign, have CycloneDX SBOMs attached, and are deployed by ArgoCD — with no manual kubectl in the promotion path.

It was built specifically for a world where AI agents write production software. The anti-LLM testing stages address failure modes that do not exist in traditional human-only development workflows.