GE CI/CD Pipeline — Enterprise-Grade Autonomous Quality System¶

This document describes the CI/CD pipeline that enforces code quality, security, and compliance across all Growing Europe repositories. It serves as a reference for internal teams, external stakeholders (GitLab, Anthropic), investors, and auditors.

Pipeline Overview¶

3-tier execution model — FAST (every push, ~2 min, 10 jobs), STANDARD (MR + main, ~5 min, 25 jobs), FULL (nightly + manual, all 31 jobs including mutation and E2E).
Nightly FULL run at 02:00 CEST — all 31 jobs including mutation testing and E2E suite.
Custom pre-built runner image (ge-ci-runner:latest) — all tools pre-installed, zero pip install overhead per job.
Self-hosted GitLab on k3s with 3 k8s-executor runners (concurrent=10).
Real container builds via kaniko — images pushed to GitLab Container Registry, signed with cosign, SBOM attached via Syft (CycloneDX).
Real ArgoCD deployment — API-triggered sync, no manual kubectl.
All tools open source. All self-hosted. Zero data leaves EU infrastructure.

See Tiered Pipeline for a full breakdown of which jobs run in which tier.

Stage-by-Stage Detail¶

The table below covers all 31 jobs across all tiers. The Tier column indicates when each job runs: F = FAST (every push), S = STANDARD (MR + main), FULL (nightly + manual trigger only).

#	Stage	Tier	Time	What It Does	Tools	Compliance
1	`lint:python`	F	9 s	Zero-tolerance Python linting with enterprise `ruff.toml`	ruff	ISO 27001 A.8.28
2	`lint:secrets`	F	14 s	Secret detection with 445-finding baseline. Blocks NEW secrets only	gitleaks	ISO 27001 A.8.28
3	`build:backend`	F	9 s	Verifies 4 core Python modules import correctly	Python `ast`	ISO 27001 A.8.25
4	`test:unit`	F	11 s	553 unit tests (217 audit + consolidation tests)	pytest	ISO 27001 A.8.25
5	`tdd:oracle-check`	F	8 s	Verifies TDD tests do not import implementation (oracle independence)	grep + `ast`	Anti-LLM Stage 2
6	`security:bandit`	F	10 s	Python SAST with CRITICAL/HIGH severity gating	Bandit	ISO 27001 A.8.28
7	`security:semgrep`	F	9 s	TypeScript SAST + 8 custom LLM anti-pattern rules	Semgrep	ISO 27001 A.8.28
8	`security:dependency-scan`	F	11 s	Dependency vulnerability audit	safety	ISO 27001 A.8.30
9	`test:integration`	F	10 s	Real PostgreSQL 15 + Redis 7 service containers	pytest + asyncpg	ISO 27001 A.8.25
10	`types:python`	S	12 s	Strict type checking across all Python modules	pyright --strict	ISO 27001 A.8.28
11	`lint:deadcode`	S	9 s	Dead code detection (knip for TS, vulture for Python)	knip, vulture	ISO 27001 A.8.28
12	`iac:checkov`	S	15 s	IaC security scan across k8s manifests and Dockerfiles	Checkov	ISO 27001 A.8.9
13	`iac:kubesec`	S	10 s	Kubernetes manifest security scoring	Kubesec	ISO 27001 A.8.9
14	`test:e2e`	S	~60 s	Playwright E2E suite (4 parallel workers in CI)	Playwright	Anti-LLM Stage 9
15	`test:adversarial`	S	10 s	AST forbidden-call scan + 100 random input fuzz testing	Python `ast` + `random`	Anti-LLM Stage 8
16	`test:reconciliation`	S	10 s	TDD suite vs post-implementation comparison	Custom	Anti-LLM Stage 6
17	`test:contract`	S	8 s	API contract verification	OpenAPI	Anti-LLM Stage 9
18	`build:image`	S	~90 s	Real container build via kaniko, pushed to GitLab Container Registry	kaniko	ISO 27001 A.8.25
19	`sign:image`	S	8 s	cosign signing with SLSA Level 3 attestation	cosign	ISO 27001 A.8.30
20	`sbom:generate`	S	12 s	CycloneDX SBOM generation, attached to image	Syft	ISO 27001 A.8.30
21	`deploy:staging`	S	~30 s	ArgoCD API-triggered sync to staging namespace	ArgoCD	CC8.1
22	`verify:health`	S	15 s	HTTP health checks, SSL verification, error rate < 0.1%	curl, custom	ISO 27001 A.8.25
23	`merge:gate`	S	~20 s	Reads JUnit/SARIF artifacts, computes release readiness score	Custom + JUnit	CC8.1
24	`test:mutation`	FULL	~5 min	Stryker incremental mutation (changed files only) for TS; mutmut for Python	Stryker, mutmut	Anti-LLM Stage 4
25	`mutation:typescript`	FULL	~4 min	Full Stryker run across all TypeScript (nightly only)	Stryker	Anti-LLM Stage 4
26	`test:property:python`	FULL	~2 min	Hypothesis property-based tests (max_examples=1000)	Hypothesis	Anti-LLM Stage 8
27	`test:property:typescript`	FULL	~2 min	fast-check property-based tests (numRuns=1000)	fast-check	Anti-LLM Stage 8
28	`dast:zap`	FULL	~5 min	OWASP ZAP baseline scan against staging	ZAP	ISO 27001 A.8.29
29	`dast:nuclei`	FULL	~3 min	Nuclei template scan against staging	Nuclei	ISO 27001 A.8.29
30	`verify:ssot`	FULL	15 s	OpenAPI spec drift, file allocation law, config hardcode scan	Custom scripts	Anti-LLM Stage 9
31	`deploy:production`	FULL/GATE	—	Manual approval required — human-in-the-loop	GitLab	EU AI Act

FAST tier wall time: ~2 minutes. STANDARD tier: ~5 minutes. FULL tier (nightly): all 31 jobs.

What Makes This Pipeline Unique¶

1. Oracle Independence (Stage 5)¶

No other CI system verifies that tests do not import implementation code. This prevents the number one LLM failure mode: tests that pass because they validate the AI's own logic rather than the specification.

When an LLM writes both the code and the tests, it can produce tests that simply mirror the implementation. Oracle independence breaks this loop by ensuring TDD tests reference only the specification interface, never the internal implementation.

2. Mutation Testing as a Gate (Stage 10)¶

Mutation testing is provided by mutmut, built on the Stryker concept from Info Support (Veenendaal, Netherlands). The 80% kill threshold ensures tests actually catch real code changes — not just execute lines for coverage metrics.

Code coverage alone is insufficient. A test suite can achieve 100% line coverage while asserting nothing meaningful. Mutation testing injects deliberate faults (mutants) into the codebase and verifies the test suite detects them. An 80% kill rate means at least 4 out of every 5 injected bugs are caught.

3. LLM Anti-Pattern Rules (Stage 7)¶

Eight custom Semgrep rules target patterns that LLMs generate with high frequency:

eval() usage
Hardcoded secrets in source
SQL injection via f-strings
Bare except clauses (swallowing errors)
XADD without MAXLEN (Redis stream memory leak)
Hardcoded Redis port 6379 (GE uses 6381)
Unvalidated external input passed to shell commands
Overly broad file permissions

These rules encode hard-won operational learnings from running a 54-agent system in production.

4. Adversarial Fuzz Testing (Stage 11)¶

One hundred random inputs are fed to the condition evaluator to verify no unexpected exceptions propagate. This is combined with an AST scan for forbidden function calls (exec, eval, compile, __import__). The fuzz testing catches edge cases that unit tests — especially LLM-generated ones — systematically miss.

5. Policy-as-Code (OPA/Rego)¶

Three policy files enforce compliance as executable code:

security.rego — Maps to ISO 27001 Annex A security controls
compliance.rego — Enforces data residency, audit trail, and logging requirements
deployment.rego — Gates production deployment criteria

Auditors can read and verify these policy files directly. There is no ambiguity between documented policy and enforced policy — they are the same artifact.

6. Zero allow_failure on Enforcement Stages¶

The following stages are fully blocking with allow_failure: false — they were previously soft-failures and have been hardened as of 2026-04-02:

types:python — pyright strict type checking
lint:deadcode — dead code detection
iac:checkov — IaC security scan
iac:kubesec — Kubernetes manifest scoring
test:e2e — Playwright end-to-end suite
mutation:typescript — full Stryker mutation run
verify:health — post-deploy health checks

Every enforcement stage either passes or blocks the pipeline. There are no "informational" stages that let problems through silently. This eliminates the common antipattern of accumulating ignored warnings until they become systemic.

7. Custom Runner Image¶

All 15+ tools are pre-installed in ge-ci-runner:latest. This eliminates pip install overhead per job and keeps FAST tier runtime under 2 minutes. The image is rebuilt on a schedule and pinned to known-good tool versions.

8. Gitleaks Baseline¶

The repository has 445 historical findings that have been reviewed and baselined. Only genuinely new secrets block the pipeline. This eliminates false-positive fatigue — the single largest reason teams disable secret scanning.

Infrastructure¶

Component	Detail
Platform	Self-hosted GitLab CE 18.8.2 on k3s v1.34.3
Runners	3 k8s-executor pods (`concurrent=10`, `pull_policy: if-not-present`)
Registry	GitLab Container Registry (2 pods) — real image builds via kaniko
Deployment	ArgoCD (API-triggered sync), no manual kubectl in pipeline
Container signing	cosign + Syft CycloneDX SBOM attached to every production image
Hardware	AMD Ryzen 9 7940HS, 16 cores, 60 GB RAM
Storage	Local-path provisioner, 100 GB Minio for artifacts
k8s security	`securityContext` applied to all 28 k8s containers
Network	All traffic on local k3s cluster; no external CI dependencies
Data residency	Netherlands; zero data leaves EU infrastructure

Compliance Mapping¶

ISO 27001:2022¶

Control	Title	How Enforced
A.8.25	Secure development lifecycle	Full pipeline execution on every commit. No code merges without green pipeline.
A.8.28	Secure coding	SAST via Bandit (Python) and Semgrep (TypeScript), ruff linting, mutation testing.
A.8.29	Security testing in development and acceptance	Adversarial fuzz testing. DAST via OWASP ZAP and Nuclei (FULL tier, nightly).
A.8.30	Outsourced development	Dependency vulnerability scanning via safety. SBOM generated per build via Syft (CycloneDX), attached to every container image.

SOC 2 Type II¶

Criteria	Title	How Enforced
CC8.1	Change management	Pipeline results linked to every merge request. Full PR history. Manual production gate.

EU AI Act¶

Requirement	How Enforced
Transparency	`Co-Authored-By` headers on every AI-generated commit. Agent identity tracked in session records.
Human oversight	`deploy:production` stage requires manual trigger. No autonomous deployment to production.
Risk management	13 automated quality gates prevent unreviewed code from reaching production.

Anti-LLM Testing Stages¶

The pipeline includes five stages specifically designed to catch failure modes unique to LLM-generated code. These are labeled "Anti-LLM Stage" in the compliance column.

Anti-LLM Stage	Pipeline Stage	Purpose
Stage 2	`tdd:oracle-check`	Prevents tests from importing implementation (oracle problem)
Stage 4	`test:mutation`	Ensures tests catch real defects, not just achieve coverage
Stage 6	`test:reconciliation`	Compares TDD-phase tests against post-implementation behavior
Stage 8	`test:adversarial`	Catches edge cases LLM-generated tests systematically miss
Stage 9	`test:contract`	Verifies API contracts match specification, not implementation

These stages exist because LLM-generated code exhibits different failure patterns than human-written code. Traditional CI pipelines were designed for human developers. This pipeline was designed for autonomous AI agents writing production software.

Learnings¶

Twelve infrastructure pitfalls encountered during pipeline construction are documented in CI/CD Infrastructure Pitfalls. Key topics include:

GitLab Container Registry DNS resolution inside k3s
YAML multiline string parsing in .gitlab-ci.yml
PEP 668 (externally-managed-environment) breaking pip install in runners
TLS certificate chain validation for self-hosted registries
Shell executor vs k8s executor trade-offs
Runner registration token rotation
Service container networking in k8s-executor mode

Evolution Path¶

The following enhancements are planned or in progress, ordered by priority:

Enhancement	Tool	Status	Purpose
Multi-project queue	GitLab CI	Planned	Priority scheduling across repositories
Stagehand agentic E2E	Stagehand (Browserbase)	Planned	Natural language browser automation for third-party integrations
Chaos testing	Litmus	Planned	Weekly chaos engineering against core services
Agent-CI bridge	`ge_orchestrator/ci_bridge.py`	Planned	Real-time agent results back to GitLab pipeline status
Policy verification	OPA/conftest	Planned	`verify:policy` job enforcing security.rego, compliance.rego
Kyverno admission policies	Kyverno	Planned	Require image signature + SBOM attestation for production deploy

Summary¶

This pipeline enforces up to 31 automated quality gates before any code reaches production. The FAST tier runs in approximately 2 minutes on every push. The STANDARD tier (~5 minutes) runs on merge requests and main-branch commits. The FULL tier (all 31 jobs) runs nightly at 02:00 CEST and on manual trigger.

The pipeline uses only open-source tools, keeps all data within EU infrastructure, and maps directly to ISO 27001, SOC 2, and EU AI Act requirements. Container images are built by kaniko, signed by cosign, have CycloneDX SBOMs attached, and are deployed by ArgoCD — with no manual kubectl in the promotion path.

It was built specifically for a world where AI agents write production software. The anti-LLM testing stages address failure modes that do not exist in traditional human-only development workflows.