Skip to content

CI/CD Implementation Plan

GE has a fully designed 10-stage anti-LLM pipeline, 20 automated validations (swimlanes), 18 pipeline agents with identities, and a GitLab CI template (539 lines, 14 stages). The pipeline now operates in three tiers (FAST / STANDARD / FULL) with 31 total jobs, real kaniko container builds, cosign image signing, Syft CycloneDX SBOMs, and ArgoCD deployment. Agent decisions do not yet feed back to CI via the bridge service. This plan bridges the remaining gap to fully operational reality.

Goal: End-to-end working CI/CD pipeline where every commit goes through code standards, compliance, and security checks, seamlessly integrated with the orchestrator, capable of handling multi-project queues at scale.


Architecture: The Agent-CI Bridge

Developer/Agent commits -> GitLab webhook -> GitLab Bridge (k8s) -> Redis Stream
                                                                      |
                                                              ge-orchestrator
                                                                      |
                                                        +-------------+-------------+
                                                        |                           |
                                                 CI Pipeline Jobs            Agent Tasks
                                                 (deterministic)           (LLM-powered)
                                                        |                           |
                                                 GitLab API <---- Results ----> Redis Stream
                                                        |
                                                 Pipeline Status
                                                 (pass/fail/block)

Key design decision: Pattern B -- CI triggers agents, agents update CI. GitLab runs deterministic jobs (lint, type check, SAST). The orchestrator dispatches LLM-powered jobs (code review, adversarial testing). Both report back to GitLab pipeline status via API.


Current State (What Already Exists)

The following infrastructure is deployed and verified on k3s in the ge-gitlab namespace:

Component Status Details
GitLab Running gitlab.ge.internal
GitLab Runners Running 3 runners, concurrent=10, k8s executor, privileged, DinD
GitLab Bridge Running 2 replicas, webhooks to Redis pub/sub, namespace ge-agents
Minio Running S3-compatible storage for CI artifacts/cache
KAS Running Kubernetes Agent Server for k8s integration
CI Template Active config/gitlab-ci-template.yaml — 3-tier model, 31 jobs, fully wired
Container Registry Running GitLab Container Registry (2 pods), kaniko builds active
ArgoCD Running API-triggered sync for staging and production namespaces
Container Signing Active cosign + Syft CycloneDX SBOM per build
Security Gates Config Exists config/security-gates-config.yaml (257 lines)
Security Findings Script Exists scripts/check-security-findings.sh (284 lines)
k8s securityContext Applied All 28 containers hardened
merge:gate Active Reads JUnit/SARIF artifacts — no hardcoded scores

What is still missing: Agent-to-CI bridge (ci_bridge.py), policy-as-code (OPA/Rego), Kyverno admission policies, Stagehand agentic E2E, chaos testing (Litmus), multi-project queue management.


Tool Stack

All tools are open source and self-hosted. No SaaS dependencies.

CI/CD Core

Tool Purpose License
GitLab CE Git hosting, CI/CD orchestration, MR workflow MIT
ArgoCD GitOps continuous deployment Apache 2.0
Harbor Container registry, vulnerability scanning Apache 2.0

Deterministic Quality Tools

Tool Language Purpose
ruff Python Linting (replaces flake8, pylint)
black Python Code formatting
isort Python Import sorting
mypy Python Type checking
pyright Python Strict type checking
ESLint TypeScript Linting with security plugins
TypeScript (tsc) TypeScript Type checking (--noEmit --strict)
knip TypeScript Dead code detection
vulture Python Dead code detection

Security Tools

Tool Purpose
Gitleaks Secret detection in commits
Semgrep SAST with custom rule support
Bandit Python-specific SAST
Trivy Container and filesystem vulnerability scanning
Checkov IaC security scanning (k8s manifests, Dockerfiles)
Kubesec Kubernetes manifest security scoring
OWASP ZAP DAST baseline and active scanning
Nuclei Template-based vulnerability scanning
pip-audit / safety Python dependency vulnerability scanning
npm audit Node dependency vulnerability scanning
ScanCode License compliance scanning
cosign Container image signing
Syft SBOM generation (CycloneDX format)
Kyverno Kubernetes admission policies

Testing Tools

Tool Purpose
pytest Python unit/integration testing
Vitest TypeScript unit testing
Playwright E2E testing (multi-browser, sharding, traces)
Stagehand Agentic browser automation for exploratory testing
mutmut Python mutation testing
Stryker TypeScript mutation testing
Hypothesis Python property-based testing
fast-check TypeScript property-based testing
k6 Load testing
Litmus Chaos engineering

Policy and Compliance

Tool Purpose
OPA / conftest Policy-as-code evaluation
Kyverno Kubernetes admission control

Phase 1: Infrastructure Foundation (Week 1) — COMPLETE

1.1 Container Registry — DONE

GitLab Container Registry is running (2 pods). Container images are built via kaniko, pushed to registry.ge.internal, signed by cosign, and have CycloneDX SBOMs attached via Syft.

1.2 Activate CI Pipeline — DONE

.gitlab-ci.yml is active at the repo root. The 3-tier model (FAST / STANDARD / FULL) is live with 31 jobs. Nightly FULL run is scheduled at 02:00 CEST.

1.3 ArgoCD for GitOps — DONE

ArgoCD is running and wired to the pipeline. Staging deploys are API-triggered on STANDARD tier runs. Production deploy requires manual gate.

1.4 Agent-CI Bridge Service — PENDING

The bridge that translates between agent task completions and GitLab pipeline status is not yet implemented.

Files to create:

  • ge_orchestrator/ci_bridge.py — Listens to ge:ci:results Redis stream, updates GitLab pipeline/job status via API
  • ge_orchestrator/ci_dispatch.py — Converts GitLab pipeline triggers into agent tasks

Flow:

  1. GitLab webhook -> GitLab Bridge -> ge:work:incoming (existing)
  2. Orchestrator routes to agents (existing)
  3. Agent completes -> ge:ci:results stream (new)
  4. CI Bridge reads results -> GitLab API to update external pipeline status (new)

Phase 2: Deterministic Pipeline (Weeks 2-3) — COMPLETE

2.1 Koen's Quality Gate (Stage 4 -- Deterministic, No LLM)

All jobs are active. The following previously had allow_failure: true and are now fully blocking:

# Lint
lint:gitleaks:       # Secret detection — FAST tier
lint:python:         # ruff + black + isort + mypy strict — FAST tier
lint:typescript:     # ESLint (security plugins) + tsc --noEmit strict — FAST tier
lint:deadcode:       # knip (TS), vulture (Python) — STANDARD tier, now blocking

# Type Safety
types:python:        # pyright --strict — STANDARD tier, now blocking
types:typescript:    # tsc --noEmit --strict — FAST tier

# Dependency Security
deps:python:         # pip-audit + safety — FAST tier
deps:node:           # npm audit + Trivy filesystem scan — STANDARD tier
deps:license:        # ScanCode license compliance — STANDARD tier

# IaC Security
iac:checkov:         # Checkov on k8s manifests + Dockerfiles — STANDARD tier, now blocking
iac:kubesec:         # Kubesec scoring on all YAML — STANDARD tier, now blocking

Custom runner image (ge-ci-runner:latest) is built and in use. All tools are pre-installed.

2.2 SAST Pipeline

sast:bandit:         # Python SAST (severity HIGH + HIGH confidence = CRITICAL)
sast:semgrep:        # TS/JS SAST (severity ERROR + HIGH confidence = CRITICAL)
sast:custom:         # Custom Semgrep rules for LLM anti-patterns

Custom Semgrep rules target LLM-specific anti-patterns: eval(), hardcoded secrets, SQL string concatenation.

Blocking rules:

  • CRITICAL -> Block merge (exit 1)
  • HIGH -> Warn only for SAST, block for DAST pre-production
  • Results in SARIF format, stored as CI artifact (ISO 27001 A.8.28 evidence)

2.3 TDD Gates

tdd:red-gate:        # Run TDD tests against stub -- ALL must FAIL
tdd:green-gate:      # Run TDD tests against implementation -- ALL must PASS
tdd:oracle-check:    # Verify test files don't import from src/ (oracle independence)

These are the most novel stages in the pipeline. No other CI system has oracle independence checking.

2.4 Unit and Integration Tests

test:unit:python:      # pytest with coverage (threshold: 85%)
test:unit:typescript:  # vitest with coverage (threshold: 85%)
test:integration:      # Real Postgres 15 + Redis 6381 services
test:e2e:              # Playwright — 4 parallel workers in CI (was 1), now blocking

Parallel execution strategy:

  • Unit tests: shard by test file across N runners (k8s autoscaling)
  • Integration tests: one pod with sidecar containers (Postgres, Redis)
  • E2E tests: 4 workers in CI (upgraded from 1), shard by test suite for further parallelism

Phase 3: Anti-LLM Quality Gates (Weeks 3-4) — COMPLETE

3.1 Mutation Testing

mutation:typescript:   # Stryker incremental — FULL tier, now blocking (allow_failure removed)
mutation:python:       # mutmut incremental — FULL tier

Thresholds: 80% mutation score on new code, 60% on existing code. Stryker runs in incremental mode — only changed files are mutated on feature branches. Full suite runs nightly at 02:00 CEST as part of the FULL tier. mutation:typescript previously had allow_failure: true; this has been removed.

3.2 Property-Based Testing

test:property:python:       # Hypothesis (max_examples=1000 in CI)
test:property:typescript:   # fast-check (numRuns=1000 in CI)

These catch edge cases LLMs systematically miss: off-by-one errors, unicode handling, empty inputs, negative numbers, overflow, timezone issues.

3.3 Test Reconciliation (Jasper)

test:reconciliation:   # Compare TDD suite vs post-implementation suite

Agent-powered job: orchestrator dispatches to Jasper, who runs reconciliation. Results flow back to CI via Agent-CI Bridge.

3.4 Adversarial Testing (Ashley)

test:adversarial:fuzz:         # Hypothesis/fast-check property tests
test:adversarial:injection:    # OWASP ZAP active scan (staging only)
test:adversarial:load:         # k6 load test (staging only)

Agent-powered: Ashley reviews fuzzing results and generates attack scenarios. Deterministic tools run the actual attacks.

3.5 SSOT Enforcement (Jaap)

verify:ssot:   # Run verify_ssot.sh + additional checks

Checks: OpenAPI spec drift, file allocation law, config hardcode scan, naming conventions, constitution compliance.


Phase 4: Merge Gate and Deployment (Weeks 4-5) — MOSTLY COMPLETE

4.1 Merge Gate (Marta/Iwona) — DONE

merge:gate:   # Reads JUnit/SARIF artifacts, computes release readiness score

merge:gate now reads actual JUnit XML and SARIF artifacts from prior stages instead of hardcoding scores. Marta aggregates results, computes a score (threshold: 70/100 to pass), generates SOC 2 CC8 evidence, checks PR size (max 1000 lines), and detects test weakening.

Output: JSON artifact with per-stage pass/fail, overall score, compliance evidence.

4.2 DAST (Pre-Production) — ACTIVE (FULL tier)

dast:zap:       # OWASP ZAP baseline scan against staging — FULL tier
dast:nuclei:    # Nuclei template scan against staging — FULL tier

Runs in the FULL tier (nightly + manual trigger) after staging deploy, before production promotion.

4.3 Container Build and Sign — DONE

build:image:      # kaniko build -> push to GitLab Container Registry — STANDARD tier
sign:image:       # cosign sign -> SLSA Level 3 attestation — STANDARD tier
sbom:generate:    # Syft -> CycloneDX SBOM -> attach to image — STANDARD tier

Kyverno admission policy: Planned — only images with valid cosign signature and SBOM attestation will be permitted to deploy to production (see Phase 7).

4.4 Deployment (ArgoCD) — DONE

deploy:staging:       # ArgoCD API-triggered sync to staging namespace (auto on STANDARD)
deploy:production:    # ArgoCD sync to production namespace (manual gate on main)

Progressive delivery: Staging auto-deploys on STANDARD tier runs. Production requires manual approval after DAST passes.

4.5 Post-Deploy Verification — DONE

verify:health:      # HTTP health checks, SSL verification, error rate < 0.1% — now blocking
verify:smoke:       # Playwright smoke tests against production
verify:rollback:    # Verify rollback procedure is documented and tested

verify:health previously had allow_failure: true; this has been removed.


Phase 5: Multi-Project Queue Management (Week 5)

5.1 Queue Architecture

Project A commit -+
Project B commit -+-> ge:ci:queue (Redis Stream) -> CI Scheduler -> Runner Pods
Project C commit -+                                                     |
                                                              +---------+---------+
                                                              |                   |
                                                    Deterministic Jobs    Agent Jobs
                                                    (parallel runners)   (orchestrator)

5.2 Priority Queue

Priority Trigger Runner Allocation
P0 (hotfix) hotfix/* branch Preempt other jobs, dedicated runner
P1 (production) main branch 50% runner capacity
P2 (staging) develop branch 30% runner capacity
P3 (feature) feat/* branches 20% runner capacity, queue if busy

5.3 Resource Maximization

  • Runner HPA: Scale runner pods 2-8 based on queue depth
  • Test sharding: Split test suites across N pods (Playwright sharding, pytest-xdist, vitest threads)
  • Dependency caching: PVC-backed cache for npm, pip, docker layers
  • Parallel stages: Lint, types, deps, SAST all run in parallel (no dependencies between them)
  • Per-project isolation: Each project gets its own k8s namespace for integration tests

5.4 Client/Project Isolation

  • Each client project gets: own GitLab group, own k8s namespace, own test database
  • Runner pods use nodeAffinity to schedule on fort-knox-dev
  • Resource quotas per namespace prevent one project starving others
  • Network policies isolate test namespaces from each other

Phase 6: E2E Testing Strategy (Weeks 5-6)

6.1 Tool Decision

Playwright remains primary for structured E2E testing: multi-browser support (Chromium, Firefox, WebKit), built-in test sharding, trace viewer for debugging, Test Agents feature (Planner/Generator/Healer), and MCP integration for AI-driven test creation.

Supplementary tools:

  • Stagehand (Browserbase) -- natural language browser automation for testing UIs agents have not seen before (e.g., third-party integrations)
  • Playwright MCP -- connects Playwright to Claude Code for intelligent test generation and self-healing

6.2 Test Pyramid

Level Tool Target Count Speed Runs When
Unit Vitest / pytest 1000+ <30s Every commit
Integration Vitest + real services 200+ <2min Every MR
Component Playwright component 100+ <1min Every MR
E2E Playwright full 50+ <5min Staging deploy
E2E (agentic) Stagehand 10+ <3min Pre-production
Visual regression Playwright screenshots All pages <2min Every MR
Load k6 Key endpoints <5min Staging deploy
Chaos Litmus Core services <10min Weekly

Phase 7: Compliance Integration (Weeks 6-7)

7.1 ISO 27001 Evidence Generation

Every CI run automatically produces the following compliance evidence:

Control Evidence Format Storage
A.8.25 (Secure SDLC) Pipeline execution log JSON GitLab CI artifacts
A.8.28 (Secure coding) SAST scan results SARIF GitLab CI artifacts + evidence repo
A.8.29 (Security testing) DAST scan results SARIF GitLab CI artifacts + evidence repo
A.8.30 (Outsourced dev) SBOM + license scan CycloneDX + JSON Harbor + evidence repo
A.8.9 (Configuration mgmt) IaC scan results SARIF GitLab CI artifacts

7.2 SOC 2 Type II Evidence

Control Evidence Generated By
CC6.1 (Logical access) MR approval logs, branch protection GitLab audit log
CC7.2 (Incident detection) Security scan findings SAST/DAST SARIF
CC8.1 (Change management) Pipeline results, MR history Marta's merge gate report

7.3 Policy-as-Code

Files to create:

  • config/ci/policies/security.rego -- OPA policies for security gates
  • config/ci/policies/compliance.rego -- OPA policies for compliance checks
  • config/ci/policies/deployment.rego -- OPA policies for deployment gates
verify:policy:   # conftest verify all policies pass

7.4 Kyverno Admission Policies

Files to create:

  • k8s/base/ci/kyverno/require-image-signature.yaml
  • k8s/base/ci/kyverno/require-sbom-attestation.yaml
  • k8s/base/ci/kyverno/enforce-resource-limits.yaml
  • k8s/base/ci/kyverno/enforce-nonroot.yaml

Phase 8: Wiki Brain Updates (Week 7)

New Wiki Pages

Page Content
development/infrastructure/cicd-pipeline.md Complete pipeline architecture, stage descriptions, tool inventory
development/infrastructure/cicd-queue-management.md Multi-project queue, priority system, resource allocation
development/procedures/cicd-troubleshooting.md Common failures, debugging, gate overrides
development/contracts/cicd-stages.md Contract per stage: inputs, outputs, thresholds, blocking rules
development/integrations/harbor-registry.md Harbor setup, image signing, vulnerability scanning
development/integrations/argocd.md ArgoCD setup, application definitions, rollback procedures
development/pitfalls/cicd.md CI/CD specific pitfalls (flaky tests, cache invalidation, runner issues)
domains/compliance/cicd-evidence.md Evidence generated, storage location, audit access

Updated Wiki Pages

Page Update
methodologies/anti-llm-pipeline/stages.md Add CI job mapping per stage
domains/project-management/delivery-swimlanes.md Add CI job references per AV
development/standards/testing.md Add mutation testing, property-based testing standards
development/infrastructure/orchestrator.md Add CI Bridge documentation
development/procedures/deploy-code.md Replace manual kubectl with ArgoCD

Phase 9: Agent Identity Updates (Week 7)

Agent Update
Koen Add CI stage ownership (lint, types, deps, SAST), policy-as-code reference
Marije / Judith Add E2E tools (Playwright + Stagehand), CI integration test stage ownership
Marta / Iwona Add merge gate CI job, SOC 2 evidence generation, release readiness scoring
Victoria Add DAST stage ownership, Kyverno policy authoring, threat model to CI rule derivation
Ashley Add adversarial CI stage, fuzzing harness ownership, load testing
Jaap Add SSOT CI stage, policy-as-code verification
Jasper Add reconciliation CI stage
Marco Add conflict detection CI stage (AST diff tooling)
Alex / Tjitte Add runner infrastructure ownership, Harbor management, ArgoCD operations
Arjan Add GitOps workflow, environment provisioning
Pol Add DAST pentest stage, Nuclei template authoring
Joshua Add quarterly calibration CI metrics, pipeline pruning authority

Agent-to-CI-Stage Mapping

The following table maps every CI pipeline stage to its owning agent(s) and indicates whether the stage is deterministic or agent-powered.

CI Stage Job(s) Type Owning Agent(s)
Lint lint:gitleaks, lint:python, lint:typescript, lint:deadcode Deterministic Koen
Type Safety types:python, types:typescript Deterministic Koen
Dependency Security deps:python, deps:node, deps:license Deterministic Koen
IaC Security iac:checkov, iac:kubesec Deterministic Koen
SAST sast:bandit, sast:semgrep, sast:custom Deterministic Koen
TDD Gates tdd:red-gate, tdd:green-gate, tdd:oracle-check Deterministic Koen
Unit Tests test:unit:python, test:unit:typescript Deterministic Marije / Judith
Integration Tests test:integration Deterministic Marije / Judith
E2E Tests test:e2e Deterministic + Agentic Marije / Judith
Mutation Testing mutation:typescript, mutation:python Deterministic Koen
Property-Based Testing test:property:python, test:property:typescript Deterministic Ashley
Test Reconciliation test:reconciliation Agent-powered Jasper
Adversarial Testing test:adversarial:fuzz, test:adversarial:injection, test:adversarial:load Hybrid Ashley
SSOT Enforcement verify:ssot Agent-powered Jaap
Merge Gate merge:gate Agent-powered Marta / Iwona
DAST dast:zap, dast:nuclei Deterministic Victoria / Pol
Container Build + Sign build:image, sign:image, sbom:generate Deterministic Alex / Tjitte
Deployment deploy:staging, deploy:production Deterministic Arjan / Alex
Post-Deploy Verification verify:health, verify:smoke, verify:rollback Deterministic Marije / Judith
Policy Verification verify:policy Deterministic Jaap

Implementation Order (Critical Path)

Week 1: Infrastructure — DONE
  +-- Container registry active (kaniko builds) (1.1)
  +-- CI pipeline active with 3-tier model, 31 jobs (1.2)
  +-- ArgoCD deployed and wired (1.3)
  +-- Custom runner image built (2.1 prereq)

Week 2: Deterministic Pipeline — DONE
  +-- Lint + Type + SAST jobs (2.1, 2.2)
  +-- TDD gates (2.3)
  +-- Unit + Integration + E2E tests (2.4)

Week 3: Anti-LLM Gates — DONE
  +-- Mutation testing incl. Stryker incremental (3.1)
  +-- Property-based testing (3.2)

Week 4: Agent-Powered Stages — DONE
  +-- Reconciliation via Jasper (3.3)
  +-- Adversarial via Ashley (3.4)
  +-- SSOT via Jaap (3.5)
  +-- Merge gate via Marta — reads JUnit/SARIF artifacts (4.1)

Week 5: Deployment + Container Supply Chain — DONE
  +-- DAST via ZAP + Nuclei (FULL tier) (4.2)
  +-- Container build via kaniko + cosign sign + Syft SBOM (4.3)
  +-- ArgoCD deployment (4.4)
  +-- Post-deploy health verification, now blocking (4.5)

Remaining work:
  +-- Agent-CI Bridge service (1.4) — PENDING
  +-- Multi-project queue (5.1-5.4) — PENDING
  +-- Stagehand agentic E2E (6.1) — PENDING
  +-- Chaos testing with Litmus — PENDING
  +-- Policy-as-code OPA/Rego (7.3) — PENDING
  +-- Kyverno admission policies (7.4) — PENDING
  +-- Agent identity updates (Phase 9) — PENDING

Verification

Per-Phase Verification

  • Each phase has its own test suite (contract tests first, TDD approach)
  • verify-executor-safety.sh must pass after each phase
  • Full 139-test regression suite must pass

End-to-End Verification

  1. Push a commit -- webhook fires, GitLab Bridge converts, orchestrator routes
  2. Deterministic jobs run in parallel (lint, types, SAST)
  3. Agent-powered jobs dispatch and complete (Koen, Marije, Jasper)
  4. All results aggregate in merge gate (Marta score threshold: 70/100)
  5. Container builds, signs, pushes to Harbor
  6. ArgoCD syncs to staging
  7. DAST runs against staging
  8. Production deploy (manual gate)
  9. Post-deploy health checks pass
  10. All compliance evidence archived

Capacity Test

  • Queue 10 MRs from different projects simultaneously
  • All complete within SLA (simple: <4h, standard: <8h)
  • Runner utilization >70% during peak
  • No resource starvation between projects

Key Files to Create

File Purpose
config/gitlab-ci-template.yaml Rewrite existing template with full pipeline
config/ci/runner-image/Dockerfile Custom runner with all tools
config/ci/policies/*.rego OPA security/compliance policies
config/semgrep-rules/*.yaml Custom SAST rules for LLM anti-patterns
ge_orchestrator/ci_bridge.py Agent results to GitLab pipeline status
ge_orchestrator/ci_dispatch.py GitLab triggers to agent task dispatch
scripts/check-security-findings.sh Severity gate enforcement (exists, needs update)
k8s/base/ci/ All CI infrastructure manifests
8 new wiki pages Documentation (see Phase 8)
12 agent identity updates Tool references (see Phase 9)