CI/CD Implementation Plan¶
GE has a fully designed 10-stage anti-LLM pipeline, 20 automated validations (swimlanes), 18 pipeline agents with identities, and a GitLab CI template (539 lines, 14 stages). The pipeline now operates in three tiers (FAST / STANDARD / FULL) with 31 total jobs, real kaniko container builds, cosign image signing, Syft CycloneDX SBOMs, and ArgoCD deployment. Agent decisions do not yet feed back to CI via the bridge service. This plan bridges the remaining gap to fully operational reality.
Goal: End-to-end working CI/CD pipeline where every commit goes through code standards, compliance, and security checks, seamlessly integrated with the orchestrator, capable of handling multi-project queues at scale.
Architecture: The Agent-CI Bridge¶
Developer/Agent commits -> GitLab webhook -> GitLab Bridge (k8s) -> Redis Stream
|
ge-orchestrator
|
+-------------+-------------+
| |
CI Pipeline Jobs Agent Tasks
(deterministic) (LLM-powered)
| |
GitLab API <---- Results ----> Redis Stream
|
Pipeline Status
(pass/fail/block)
Key design decision: Pattern B -- CI triggers agents, agents update CI. GitLab runs deterministic jobs (lint, type check, SAST). The orchestrator dispatches LLM-powered jobs (code review, adversarial testing). Both report back to GitLab pipeline status via API.
Current State (What Already Exists)¶
The following infrastructure is deployed and verified on k3s in the ge-gitlab namespace:
| Component | Status | Details |
|---|---|---|
| GitLab | Running | gitlab.ge.internal |
| GitLab Runners | Running | 3 runners, concurrent=10, k8s executor, privileged, DinD |
| GitLab Bridge | Running | 2 replicas, webhooks to Redis pub/sub, namespace ge-agents |
| Minio | Running | S3-compatible storage for CI artifacts/cache |
| KAS | Running | Kubernetes Agent Server for k8s integration |
| CI Template | Active | config/gitlab-ci-template.yaml — 3-tier model, 31 jobs, fully wired |
| Container Registry | Running | GitLab Container Registry (2 pods), kaniko builds active |
| ArgoCD | Running | API-triggered sync for staging and production namespaces |
| Container Signing | Active | cosign + Syft CycloneDX SBOM per build |
| Security Gates Config | Exists | config/security-gates-config.yaml (257 lines) |
| Security Findings Script | Exists | scripts/check-security-findings.sh (284 lines) |
| k8s securityContext | Applied | All 28 containers hardened |
| merge:gate | Active | Reads JUnit/SARIF artifacts — no hardcoded scores |
What is still missing: Agent-to-CI bridge (ci_bridge.py), policy-as-code (OPA/Rego), Kyverno admission policies, Stagehand agentic E2E, chaos testing (Litmus), multi-project queue management.
Tool Stack¶
All tools are open source and self-hosted. No SaaS dependencies.
CI/CD Core¶
| Tool | Purpose | License |
|---|---|---|
| GitLab CE | Git hosting, CI/CD orchestration, MR workflow | MIT |
| ArgoCD | GitOps continuous deployment | Apache 2.0 |
| Harbor | Container registry, vulnerability scanning | Apache 2.0 |
Deterministic Quality Tools¶
| Tool | Language | Purpose |
|---|---|---|
| ruff | Python | Linting (replaces flake8, pylint) |
| black | Python | Code formatting |
| isort | Python | Import sorting |
| mypy | Python | Type checking |
| pyright | Python | Strict type checking |
| ESLint | TypeScript | Linting with security plugins |
TypeScript (tsc) |
TypeScript | Type checking (--noEmit --strict) |
| knip | TypeScript | Dead code detection |
| vulture | Python | Dead code detection |
Security Tools¶
| Tool | Purpose |
|---|---|
| Gitleaks | Secret detection in commits |
| Semgrep | SAST with custom rule support |
| Bandit | Python-specific SAST |
| Trivy | Container and filesystem vulnerability scanning |
| Checkov | IaC security scanning (k8s manifests, Dockerfiles) |
| Kubesec | Kubernetes manifest security scoring |
| OWASP ZAP | DAST baseline and active scanning |
| Nuclei | Template-based vulnerability scanning |
| pip-audit / safety | Python dependency vulnerability scanning |
| npm audit | Node dependency vulnerability scanning |
| ScanCode | License compliance scanning |
| cosign | Container image signing |
| Syft | SBOM generation (CycloneDX format) |
| Kyverno | Kubernetes admission policies |
Testing Tools¶
| Tool | Purpose |
|---|---|
| pytest | Python unit/integration testing |
| Vitest | TypeScript unit testing |
| Playwright | E2E testing (multi-browser, sharding, traces) |
| Stagehand | Agentic browser automation for exploratory testing |
| mutmut | Python mutation testing |
| Stryker | TypeScript mutation testing |
| Hypothesis | Python property-based testing |
| fast-check | TypeScript property-based testing |
| k6 | Load testing |
| Litmus | Chaos engineering |
Policy and Compliance¶
| Tool | Purpose |
|---|---|
| OPA / conftest | Policy-as-code evaluation |
| Kyverno | Kubernetes admission control |
Phase 1: Infrastructure Foundation (Week 1) — COMPLETE¶
1.1 Container Registry — DONE¶
GitLab Container Registry is running (2 pods). Container images are built via kaniko, pushed to registry.ge.internal, signed by cosign, and have CycloneDX SBOMs attached via Syft.
1.2 Activate CI Pipeline — DONE¶
.gitlab-ci.yml is active at the repo root. The 3-tier model (FAST / STANDARD / FULL) is live with 31 jobs. Nightly FULL run is scheduled at 02:00 CEST.
1.3 ArgoCD for GitOps — DONE¶
ArgoCD is running and wired to the pipeline. Staging deploys are API-triggered on STANDARD tier runs. Production deploy requires manual gate.
1.4 Agent-CI Bridge Service — PENDING¶
The bridge that translates between agent task completions and GitLab pipeline status is not yet implemented.
Files to create:
ge_orchestrator/ci_bridge.py— Listens toge:ci:resultsRedis stream, updates GitLab pipeline/job status via APIge_orchestrator/ci_dispatch.py— Converts GitLab pipeline triggers into agent tasks
Flow:
- GitLab webhook -> GitLab Bridge ->
ge:work:incoming(existing) - Orchestrator routes to agents (existing)
- Agent completes ->
ge:ci:resultsstream (new) - CI Bridge reads results -> GitLab API to update external pipeline status (new)
Phase 2: Deterministic Pipeline (Weeks 2-3) — COMPLETE¶
2.1 Koen's Quality Gate (Stage 4 -- Deterministic, No LLM)¶
All jobs are active. The following previously had allow_failure: true and are now fully blocking:
# Lint
lint:gitleaks: # Secret detection — FAST tier
lint:python: # ruff + black + isort + mypy strict — FAST tier
lint:typescript: # ESLint (security plugins) + tsc --noEmit strict — FAST tier
lint:deadcode: # knip (TS), vulture (Python) — STANDARD tier, now blocking
# Type Safety
types:python: # pyright --strict — STANDARD tier, now blocking
types:typescript: # tsc --noEmit --strict — FAST tier
# Dependency Security
deps:python: # pip-audit + safety — FAST tier
deps:node: # npm audit + Trivy filesystem scan — STANDARD tier
deps:license: # ScanCode license compliance — STANDARD tier
# IaC Security
iac:checkov: # Checkov on k8s manifests + Dockerfiles — STANDARD tier, now blocking
iac:kubesec: # Kubesec scoring on all YAML — STANDARD tier, now blocking
Custom runner image (ge-ci-runner:latest) is built and in use. All tools are pre-installed.
2.2 SAST Pipeline¶
sast:bandit: # Python SAST (severity HIGH + HIGH confidence = CRITICAL)
sast:semgrep: # TS/JS SAST (severity ERROR + HIGH confidence = CRITICAL)
sast:custom: # Custom Semgrep rules for LLM anti-patterns
Custom Semgrep rules target LLM-specific anti-patterns: eval(), hardcoded secrets, SQL string concatenation.
Blocking rules:
- CRITICAL -> Block merge (exit 1)
- HIGH -> Warn only for SAST, block for DAST pre-production
- Results in SARIF format, stored as CI artifact (ISO 27001 A.8.28 evidence)
2.3 TDD Gates¶
tdd:red-gate: # Run TDD tests against stub -- ALL must FAIL
tdd:green-gate: # Run TDD tests against implementation -- ALL must PASS
tdd:oracle-check: # Verify test files don't import from src/ (oracle independence)
These are the most novel stages in the pipeline. No other CI system has oracle independence checking.
2.4 Unit and Integration Tests¶
test:unit:python: # pytest with coverage (threshold: 85%)
test:unit:typescript: # vitest with coverage (threshold: 85%)
test:integration: # Real Postgres 15 + Redis 6381 services
test:e2e: # Playwright — 4 parallel workers in CI (was 1), now blocking
Parallel execution strategy:
- Unit tests: shard by test file across N runners (k8s autoscaling)
- Integration tests: one pod with sidecar containers (Postgres, Redis)
- E2E tests: 4 workers in CI (upgraded from 1), shard by test suite for further parallelism
Phase 3: Anti-LLM Quality Gates (Weeks 3-4) — COMPLETE¶
3.1 Mutation Testing¶
mutation:typescript: # Stryker incremental — FULL tier, now blocking (allow_failure removed)
mutation:python: # mutmut incremental — FULL tier
Thresholds: 80% mutation score on new code, 60% on existing code. Stryker runs in incremental mode — only changed files are mutated on feature branches. Full suite runs nightly at 02:00 CEST as part of the FULL tier. mutation:typescript previously had allow_failure: true; this has been removed.
3.2 Property-Based Testing¶
test:property:python: # Hypothesis (max_examples=1000 in CI)
test:property:typescript: # fast-check (numRuns=1000 in CI)
These catch edge cases LLMs systematically miss: off-by-one errors, unicode handling, empty inputs, negative numbers, overflow, timezone issues.
3.3 Test Reconciliation (Jasper)¶
Agent-powered job: orchestrator dispatches to Jasper, who runs reconciliation. Results flow back to CI via Agent-CI Bridge.
3.4 Adversarial Testing (Ashley)¶
test:adversarial:fuzz: # Hypothesis/fast-check property tests
test:adversarial:injection: # OWASP ZAP active scan (staging only)
test:adversarial:load: # k6 load test (staging only)
Agent-powered: Ashley reviews fuzzing results and generates attack scenarios. Deterministic tools run the actual attacks.
3.5 SSOT Enforcement (Jaap)¶
Checks: OpenAPI spec drift, file allocation law, config hardcode scan, naming conventions, constitution compliance.
Phase 4: Merge Gate and Deployment (Weeks 4-5) — MOSTLY COMPLETE¶
4.1 Merge Gate (Marta/Iwona) — DONE¶
merge:gate now reads actual JUnit XML and SARIF artifacts from prior stages instead of hardcoding scores. Marta aggregates results, computes a score (threshold: 70/100 to pass), generates SOC 2 CC8 evidence, checks PR size (max 1000 lines), and detects test weakening.
Output: JSON artifact with per-stage pass/fail, overall score, compliance evidence.
4.2 DAST (Pre-Production) — ACTIVE (FULL tier)¶
dast:zap: # OWASP ZAP baseline scan against staging — FULL tier
dast:nuclei: # Nuclei template scan against staging — FULL tier
Runs in the FULL tier (nightly + manual trigger) after staging deploy, before production promotion.
4.3 Container Build and Sign — DONE¶
build:image: # kaniko build -> push to GitLab Container Registry — STANDARD tier
sign:image: # cosign sign -> SLSA Level 3 attestation — STANDARD tier
sbom:generate: # Syft -> CycloneDX SBOM -> attach to image — STANDARD tier
Kyverno admission policy: Planned — only images with valid cosign signature and SBOM attestation will be permitted to deploy to production (see Phase 7).
4.4 Deployment (ArgoCD) — DONE¶
deploy:staging: # ArgoCD API-triggered sync to staging namespace (auto on STANDARD)
deploy:production: # ArgoCD sync to production namespace (manual gate on main)
Progressive delivery: Staging auto-deploys on STANDARD tier runs. Production requires manual approval after DAST passes.
4.5 Post-Deploy Verification — DONE¶
verify:health: # HTTP health checks, SSL verification, error rate < 0.1% — now blocking
verify:smoke: # Playwright smoke tests against production
verify:rollback: # Verify rollback procedure is documented and tested
verify:health previously had allow_failure: true; this has been removed.
Phase 5: Multi-Project Queue Management (Week 5)¶
5.1 Queue Architecture¶
Project A commit -+
Project B commit -+-> ge:ci:queue (Redis Stream) -> CI Scheduler -> Runner Pods
Project C commit -+ |
+---------+---------+
| |
Deterministic Jobs Agent Jobs
(parallel runners) (orchestrator)
5.2 Priority Queue¶
| Priority | Trigger | Runner Allocation |
|---|---|---|
| P0 (hotfix) | hotfix/* branch |
Preempt other jobs, dedicated runner |
| P1 (production) | main branch |
50% runner capacity |
| P2 (staging) | develop branch |
30% runner capacity |
| P3 (feature) | feat/* branches |
20% runner capacity, queue if busy |
5.3 Resource Maximization¶
- Runner HPA: Scale runner pods 2-8 based on queue depth
- Test sharding: Split test suites across N pods (Playwright sharding, pytest-xdist, vitest threads)
- Dependency caching: PVC-backed cache for npm, pip, docker layers
- Parallel stages: Lint, types, deps, SAST all run in parallel (no dependencies between them)
- Per-project isolation: Each project gets its own k8s namespace for integration tests
5.4 Client/Project Isolation¶
- Each client project gets: own GitLab group, own k8s namespace, own test database
- Runner pods use
nodeAffinityto schedule on fort-knox-dev - Resource quotas per namespace prevent one project starving others
- Network policies isolate test namespaces from each other
Phase 6: E2E Testing Strategy (Weeks 5-6)¶
6.1 Tool Decision¶
Playwright remains primary for structured E2E testing: multi-browser support (Chromium, Firefox, WebKit), built-in test sharding, trace viewer for debugging, Test Agents feature (Planner/Generator/Healer), and MCP integration for AI-driven test creation.
Supplementary tools:
- Stagehand (Browserbase) -- natural language browser automation for testing UIs agents have not seen before (e.g., third-party integrations)
- Playwright MCP -- connects Playwright to Claude Code for intelligent test generation and self-healing
6.2 Test Pyramid¶
| Level | Tool | Target Count | Speed | Runs When |
|---|---|---|---|---|
| Unit | Vitest / pytest | 1000+ | <30s | Every commit |
| Integration | Vitest + real services | 200+ | <2min | Every MR |
| Component | Playwright component | 100+ | <1min | Every MR |
| E2E | Playwright full | 50+ | <5min | Staging deploy |
| E2E (agentic) | Stagehand | 10+ | <3min | Pre-production |
| Visual regression | Playwright screenshots | All pages | <2min | Every MR |
| Load | k6 | Key endpoints | <5min | Staging deploy |
| Chaos | Litmus | Core services | <10min | Weekly |
Phase 7: Compliance Integration (Weeks 6-7)¶
7.1 ISO 27001 Evidence Generation¶
Every CI run automatically produces the following compliance evidence:
| Control | Evidence | Format | Storage |
|---|---|---|---|
| A.8.25 (Secure SDLC) | Pipeline execution log | JSON | GitLab CI artifacts |
| A.8.28 (Secure coding) | SAST scan results | SARIF | GitLab CI artifacts + evidence repo |
| A.8.29 (Security testing) | DAST scan results | SARIF | GitLab CI artifacts + evidence repo |
| A.8.30 (Outsourced dev) | SBOM + license scan | CycloneDX + JSON | Harbor + evidence repo |
| A.8.9 (Configuration mgmt) | IaC scan results | SARIF | GitLab CI artifacts |
7.2 SOC 2 Type II Evidence¶
| Control | Evidence | Generated By |
|---|---|---|
| CC6.1 (Logical access) | MR approval logs, branch protection | GitLab audit log |
| CC7.2 (Incident detection) | Security scan findings | SAST/DAST SARIF |
| CC8.1 (Change management) | Pipeline results, MR history | Marta's merge gate report |
7.3 Policy-as-Code¶
Files to create:
config/ci/policies/security.rego-- OPA policies for security gatesconfig/ci/policies/compliance.rego-- OPA policies for compliance checksconfig/ci/policies/deployment.rego-- OPA policies for deployment gates
7.4 Kyverno Admission Policies¶
Files to create:
k8s/base/ci/kyverno/require-image-signature.yamlk8s/base/ci/kyverno/require-sbom-attestation.yamlk8s/base/ci/kyverno/enforce-resource-limits.yamlk8s/base/ci/kyverno/enforce-nonroot.yaml
Phase 8: Wiki Brain Updates (Week 7)¶
New Wiki Pages¶
| Page | Content |
|---|---|
development/infrastructure/cicd-pipeline.md |
Complete pipeline architecture, stage descriptions, tool inventory |
development/infrastructure/cicd-queue-management.md |
Multi-project queue, priority system, resource allocation |
development/procedures/cicd-troubleshooting.md |
Common failures, debugging, gate overrides |
development/contracts/cicd-stages.md |
Contract per stage: inputs, outputs, thresholds, blocking rules |
development/integrations/harbor-registry.md |
Harbor setup, image signing, vulnerability scanning |
development/integrations/argocd.md |
ArgoCD setup, application definitions, rollback procedures |
development/pitfalls/cicd.md |
CI/CD specific pitfalls (flaky tests, cache invalidation, runner issues) |
domains/compliance/cicd-evidence.md |
Evidence generated, storage location, audit access |
Updated Wiki Pages¶
| Page | Update |
|---|---|
methodologies/anti-llm-pipeline/stages.md |
Add CI job mapping per stage |
domains/project-management/delivery-swimlanes.md |
Add CI job references per AV |
development/standards/testing.md |
Add mutation testing, property-based testing standards |
development/infrastructure/orchestrator.md |
Add CI Bridge documentation |
development/procedures/deploy-code.md |
Replace manual kubectl with ArgoCD |
Phase 9: Agent Identity Updates (Week 7)¶
| Agent | Update |
|---|---|
| Koen | Add CI stage ownership (lint, types, deps, SAST), policy-as-code reference |
| Marije / Judith | Add E2E tools (Playwright + Stagehand), CI integration test stage ownership |
| Marta / Iwona | Add merge gate CI job, SOC 2 evidence generation, release readiness scoring |
| Victoria | Add DAST stage ownership, Kyverno policy authoring, threat model to CI rule derivation |
| Ashley | Add adversarial CI stage, fuzzing harness ownership, load testing |
| Jaap | Add SSOT CI stage, policy-as-code verification |
| Jasper | Add reconciliation CI stage |
| Marco | Add conflict detection CI stage (AST diff tooling) |
| Alex / Tjitte | Add runner infrastructure ownership, Harbor management, ArgoCD operations |
| Arjan | Add GitOps workflow, environment provisioning |
| Pol | Add DAST pentest stage, Nuclei template authoring |
| Joshua | Add quarterly calibration CI metrics, pipeline pruning authority |
Agent-to-CI-Stage Mapping¶
The following table maps every CI pipeline stage to its owning agent(s) and indicates whether the stage is deterministic or agent-powered.
| CI Stage | Job(s) | Type | Owning Agent(s) |
|---|---|---|---|
| Lint | lint:gitleaks, lint:python, lint:typescript, lint:deadcode |
Deterministic | Koen |
| Type Safety | types:python, types:typescript |
Deterministic | Koen |
| Dependency Security | deps:python, deps:node, deps:license |
Deterministic | Koen |
| IaC Security | iac:checkov, iac:kubesec |
Deterministic | Koen |
| SAST | sast:bandit, sast:semgrep, sast:custom |
Deterministic | Koen |
| TDD Gates | tdd:red-gate, tdd:green-gate, tdd:oracle-check |
Deterministic | Koen |
| Unit Tests | test:unit:python, test:unit:typescript |
Deterministic | Marije / Judith |
| Integration Tests | test:integration |
Deterministic | Marije / Judith |
| E2E Tests | test:e2e |
Deterministic + Agentic | Marije / Judith |
| Mutation Testing | mutation:typescript, mutation:python |
Deterministic | Koen |
| Property-Based Testing | test:property:python, test:property:typescript |
Deterministic | Ashley |
| Test Reconciliation | test:reconciliation |
Agent-powered | Jasper |
| Adversarial Testing | test:adversarial:fuzz, test:adversarial:injection, test:adversarial:load |
Hybrid | Ashley |
| SSOT Enforcement | verify:ssot |
Agent-powered | Jaap |
| Merge Gate | merge:gate |
Agent-powered | Marta / Iwona |
| DAST | dast:zap, dast:nuclei |
Deterministic | Victoria / Pol |
| Container Build + Sign | build:image, sign:image, sbom:generate |
Deterministic | Alex / Tjitte |
| Deployment | deploy:staging, deploy:production |
Deterministic | Arjan / Alex |
| Post-Deploy Verification | verify:health, verify:smoke, verify:rollback |
Deterministic | Marije / Judith |
| Policy Verification | verify:policy |
Deterministic | Jaap |
Implementation Order (Critical Path)¶
Week 1: Infrastructure — DONE
+-- Container registry active (kaniko builds) (1.1)
+-- CI pipeline active with 3-tier model, 31 jobs (1.2)
+-- ArgoCD deployed and wired (1.3)
+-- Custom runner image built (2.1 prereq)
Week 2: Deterministic Pipeline — DONE
+-- Lint + Type + SAST jobs (2.1, 2.2)
+-- TDD gates (2.3)
+-- Unit + Integration + E2E tests (2.4)
Week 3: Anti-LLM Gates — DONE
+-- Mutation testing incl. Stryker incremental (3.1)
+-- Property-based testing (3.2)
Week 4: Agent-Powered Stages — DONE
+-- Reconciliation via Jasper (3.3)
+-- Adversarial via Ashley (3.4)
+-- SSOT via Jaap (3.5)
+-- Merge gate via Marta — reads JUnit/SARIF artifacts (4.1)
Week 5: Deployment + Container Supply Chain — DONE
+-- DAST via ZAP + Nuclei (FULL tier) (4.2)
+-- Container build via kaniko + cosign sign + Syft SBOM (4.3)
+-- ArgoCD deployment (4.4)
+-- Post-deploy health verification, now blocking (4.5)
Remaining work:
+-- Agent-CI Bridge service (1.4) — PENDING
+-- Multi-project queue (5.1-5.4) — PENDING
+-- Stagehand agentic E2E (6.1) — PENDING
+-- Chaos testing with Litmus — PENDING
+-- Policy-as-code OPA/Rego (7.3) — PENDING
+-- Kyverno admission policies (7.4) — PENDING
+-- Agent identity updates (Phase 9) — PENDING
Verification¶
Per-Phase Verification¶
- Each phase has its own test suite (contract tests first, TDD approach)
verify-executor-safety.shmust pass after each phase- Full 139-test regression suite must pass
End-to-End Verification¶
- Push a commit -- webhook fires, GitLab Bridge converts, orchestrator routes
- Deterministic jobs run in parallel (lint, types, SAST)
- Agent-powered jobs dispatch and complete (Koen, Marije, Jasper)
- All results aggregate in merge gate (Marta score threshold: 70/100)
- Container builds, signs, pushes to Harbor
- ArgoCD syncs to staging
- DAST runs against staging
- Production deploy (manual gate)
- Post-deploy health checks pass
- All compliance evidence archived
Capacity Test¶
- Queue 10 MRs from different projects simultaneously
- All complete within SLA (simple: <4h, standard: <8h)
- Runner utilization >70% during peak
- No resource starvation between projects
Key Files to Create¶
| File | Purpose |
|---|---|
config/gitlab-ci-template.yaml |
Rewrite existing template with full pipeline |
config/ci/runner-image/Dockerfile |
Custom runner with all tools |
config/ci/policies/*.rego |
OPA security/compliance policies |
config/semgrep-rules/*.yaml |
Custom SAST rules for LLM anti-patterns |
ge_orchestrator/ci_bridge.py |
Agent results to GitLab pipeline status |
ge_orchestrator/ci_dispatch.py |
GitLab triggers to agent task dispatch |
scripts/check-security-findings.sh |
Severity gate enforcement (exists, needs update) |
k8s/base/ci/ |
All CI infrastructure manifests |
| 8 new wiki pages | Documentation (see Phase 8) |
| 12 agent identity updates | Tool references (see Phase 9) |