CI/CD Implementation Plan¶

GE has a fully designed 10-stage anti-LLM pipeline, 20 automated validations (swimlanes), 18 pipeline agents with identities, and a GitLab CI template (539 lines, 14 stages). The pipeline now operates in three tiers (FAST / STANDARD / FULL) with 31 total jobs, real kaniko container builds, cosign image signing, Syft CycloneDX SBOMs, and ArgoCD deployment. Agent decisions do not yet feed back to CI via the bridge service. This plan bridges the remaining gap to fully operational reality.

Goal: End-to-end working CI/CD pipeline where every commit goes through code standards, compliance, and security checks, seamlessly integrated with the orchestrator, capable of handling multi-project queues at scale.

Architecture: The Agent-CI Bridge¶

Developer/Agent commits -> GitLab webhook -> GitLab Bridge (k8s) -> Redis Stream
                                                                      |
                                                              ge-orchestrator
                                                                      |
                                                        +-------------+-------------+
                                                        |                           |
                                                 CI Pipeline Jobs            Agent Tasks
                                                 (deterministic)           (LLM-powered)
                                                        |                           |
                                                 GitLab API <---- Results ----> Redis Stream
                                                        |
                                                 Pipeline Status
                                                 (pass/fail/block)

Key design decision: Pattern B -- CI triggers agents, agents update CI. GitLab runs deterministic jobs (lint, type check, SAST). The orchestrator dispatches LLM-powered jobs (code review, adversarial testing). Both report back to GitLab pipeline status via API.

Current State (What Already Exists)¶

The following infrastructure is deployed and verified on k3s in the ge-gitlab namespace:

Component	Status	Details
GitLab	Running	`gitlab.ge.internal`
GitLab Runners	Running	3 runners, concurrent=10, k8s executor, privileged, DinD
GitLab Bridge	Running	2 replicas, webhooks to Redis pub/sub, namespace `ge-agents`
Minio	Running	S3-compatible storage for CI artifacts/cache
KAS	Running	Kubernetes Agent Server for k8s integration
CI Template	Active	`config/gitlab-ci-template.yaml` — 3-tier model, 31 jobs, fully wired
Container Registry	Running	GitLab Container Registry (2 pods), kaniko builds active
ArgoCD	Running	API-triggered sync for staging and production namespaces
Container Signing	Active	cosign + Syft CycloneDX SBOM per build
Security Gates Config	Exists	`config/security-gates-config.yaml` (257 lines)
Security Findings Script	Exists	`scripts/check-security-findings.sh` (284 lines)
k8s securityContext	Applied	All 28 containers hardened
merge:gate	Active	Reads JUnit/SARIF artifacts — no hardcoded scores

What is still missing: Agent-to-CI bridge (ci_bridge.py), policy-as-code (OPA/Rego), Kyverno admission policies, Stagehand agentic E2E, chaos testing (Litmus), multi-project queue management.

Tool Stack¶

All tools are open source and self-hosted. No SaaS dependencies.

CI/CD Core¶

Tool	Purpose	License
GitLab CE	Git hosting, CI/CD orchestration, MR workflow	MIT
ArgoCD	GitOps continuous deployment	Apache 2.0
Harbor	Container registry, vulnerability scanning	Apache 2.0

Deterministic Quality Tools¶

Tool	Language	Purpose
ruff	Python	Linting (replaces flake8, pylint)
black	Python	Code formatting
isort	Python	Import sorting
mypy	Python	Type checking
pyright	Python	Strict type checking
ESLint	TypeScript	Linting with security plugins
TypeScript (`tsc`)	TypeScript	Type checking (`--noEmit --strict`)
knip	TypeScript	Dead code detection
vulture	Python	Dead code detection

Security Tools¶

Tool	Purpose
Gitleaks	Secret detection in commits
Semgrep	SAST with custom rule support
Bandit	Python-specific SAST
Trivy	Container and filesystem vulnerability scanning
Checkov	IaC security scanning (k8s manifests, Dockerfiles)
Kubesec	Kubernetes manifest security scoring
OWASP ZAP	DAST baseline and active scanning
Nuclei	Template-based vulnerability scanning
pip-audit / safety	Python dependency vulnerability scanning
npm audit	Node dependency vulnerability scanning
ScanCode	License compliance scanning
cosign	Container image signing
Syft	SBOM generation (CycloneDX format)
Kyverno	Kubernetes admission policies

Testing Tools¶

Tool	Purpose
pytest	Python unit/integration testing
Vitest	TypeScript unit testing
Playwright	E2E testing (multi-browser, sharding, traces)
Stagehand	Agentic browser automation for exploratory testing
mutmut	Python mutation testing
Stryker	TypeScript mutation testing
Hypothesis	Python property-based testing
fast-check	TypeScript property-based testing
k6	Load testing
Litmus	Chaos engineering

Policy and Compliance¶

Tool	Purpose
OPA / conftest	Policy-as-code evaluation
Kyverno	Kubernetes admission control

Phase 1: Infrastructure Foundation (Week 1) — COMPLETE¶

1.1 Container Registry — DONE¶

GitLab Container Registry is running (2 pods). Container images are built via kaniko, pushed to registry.ge.internal, signed by cosign, and have CycloneDX SBOMs attached via Syft.

1.2 Activate CI Pipeline — DONE¶

.gitlab-ci.yml is active at the repo root. The 3-tier model (FAST / STANDARD / FULL) is live with 31 jobs. Nightly FULL run is scheduled at 02:00 CEST.

1.3 ArgoCD for GitOps — DONE¶

ArgoCD is running and wired to the pipeline. Staging deploys are API-triggered on STANDARD tier runs. Production deploy requires manual gate.

1.4 Agent-CI Bridge Service — PENDING¶

The bridge that translates between agent task completions and GitLab pipeline status is not yet implemented.

Files to create:

ge_orchestrator/ci_bridge.py — Listens to ge:ci:results Redis stream, updates GitLab pipeline/job status via API
ge_orchestrator/ci_dispatch.py — Converts GitLab pipeline triggers into agent tasks

Flow:

GitLab webhook -> GitLab Bridge -> ge:work:incoming (existing)
Orchestrator routes to agents (existing)
Agent completes -> ge:ci:results stream (new)
CI Bridge reads results -> GitLab API to update external pipeline status (new)

Phase 2: Deterministic Pipeline (Weeks 2-3) — COMPLETE¶

2.1 Koen's Quality Gate (Stage 4 -- Deterministic, No LLM)¶

All jobs are active. The following previously had allow_failure: true and are now fully blocking:

# Lint
lint:gitleaks:       # Secret detection — FAST tier
lint:python:         # ruff + black + isort + mypy strict — FAST tier
lint:typescript:     # ESLint (security plugins) + tsc --noEmit strict — FAST tier
lint:deadcode:       # knip (TS), vulture (Python) — STANDARD tier, now blocking

# Type Safety
types:python:        # pyright --strict — STANDARD tier, now blocking
types:typescript:    # tsc --noEmit --strict — FAST tier

# Dependency Security
deps:python:         # pip-audit + safety — FAST tier
deps:node:           # npm audit + Trivy filesystem scan — STANDARD tier
deps:license:        # ScanCode license compliance — STANDARD tier

# IaC Security
iac:checkov:         # Checkov on k8s manifests + Dockerfiles — STANDARD tier, now blocking
iac:kubesec:         # Kubesec scoring on all YAML — STANDARD tier, now blocking

Custom runner image (ge-ci-runner:latest) is built and in use. All tools are pre-installed.

2.2 SAST Pipeline¶

sast:bandit:         # Python SAST (severity HIGH + HIGH confidence = CRITICAL)
sast:semgrep:        # TS/JS SAST (severity ERROR + HIGH confidence = CRITICAL)
sast:custom:         # Custom Semgrep rules for LLM anti-patterns

Custom Semgrep rules target LLM-specific anti-patterns: eval(), hardcoded secrets, SQL string concatenation.

Blocking rules:

CRITICAL -> Block merge (exit 1)
HIGH -> Warn only for SAST, block for DAST pre-production
Results in SARIF format, stored as CI artifact (ISO 27001 A.8.28 evidence)

2.3 TDD Gates¶

tdd:red-gate:        # Run TDD tests against stub -- ALL must FAIL
tdd:green-gate:      # Run TDD tests against implementation -- ALL must PASS
tdd:oracle-check:    # Verify test files don't import from src/ (oracle independence)

These are the most novel stages in the pipeline. No other CI system has oracle independence checking.

2.4 Unit and Integration Tests¶

test:unit:python:      # pytest with coverage (threshold: 85%)
test:unit:typescript:  # vitest with coverage (threshold: 85%)
test:integration:      # Real Postgres 15 + Redis 6381 services
test:e2e:              # Playwright — 4 parallel workers in CI (was 1), now blocking

Parallel execution strategy:

Unit tests: shard by test file across N runners (k8s autoscaling)
Integration tests: one pod with sidecar containers (Postgres, Redis)
E2E tests: 4 workers in CI (upgraded from 1), shard by test suite for further parallelism

Phase 3: Anti-LLM Quality Gates (Weeks 3-4) — COMPLETE¶

3.1 Mutation Testing¶

mutation:typescript:   # Stryker incremental — FULL tier, now blocking (allow_failure removed)
mutation:python:       # mutmut incremental — FULL tier

Thresholds: 80% mutation score on new code, 60% on existing code. Stryker runs in incremental mode — only changed files are mutated on feature branches. Full suite runs nightly at 02:00 CEST as part of the FULL tier. mutation:typescript previously had allow_failure: true; this has been removed.

3.2 Property-Based Testing¶

test:property:python:       # Hypothesis (max_examples=1000 in CI)
test:property:typescript:   # fast-check (numRuns=1000 in CI)

These catch edge cases LLMs systematically miss: off-by-one errors, unicode handling, empty inputs, negative numbers, overflow, timezone issues.

3.3 Test Reconciliation (Jasper)¶

test:reconciliation:   # Compare TDD suite vs post-implementation suite

Agent-powered job: orchestrator dispatches to Jasper, who runs reconciliation. Results flow back to CI via Agent-CI Bridge.

3.4 Adversarial Testing (Ashley)¶

test:adversarial:fuzz:         # Hypothesis/fast-check property tests
test:adversarial:injection:    # OWASP ZAP active scan (staging only)
test:adversarial:load:         # k6 load test (staging only)

Agent-powered: Ashley reviews fuzzing results and generates attack scenarios. Deterministic tools run the actual attacks.

3.5 SSOT Enforcement (Jaap)¶

verify:ssot:   # Run verify_ssot.sh + additional checks

Checks: OpenAPI spec drift, file allocation law, config hardcode scan, naming conventions, constitution compliance.

Phase 4: Merge Gate and Deployment (Weeks 4-5) — MOSTLY COMPLETE¶

4.1 Merge Gate (Marta/Iwona) — DONE¶

merge:gate:   # Reads JUnit/SARIF artifacts, computes release readiness score

merge:gate now reads actual JUnit XML and SARIF artifacts from prior stages instead of hardcoding scores. Marta aggregates results, computes a score (threshold: 70/100 to pass), generates SOC 2 CC8 evidence, checks PR size (max 1000 lines), and detects test weakening.

Output: JSON artifact with per-stage pass/fail, overall score, compliance evidence.

4.2 DAST (Pre-Production) — ACTIVE (FULL tier)¶

dast:zap:       # OWASP ZAP baseline scan against staging — FULL tier
dast:nuclei:    # Nuclei template scan against staging — FULL tier

Runs in the FULL tier (nightly + manual trigger) after staging deploy, before production promotion.

4.3 Container Build and Sign — DONE¶

build:image:      # kaniko build -> push to GitLab Container Registry — STANDARD tier
sign:image:       # cosign sign -> SLSA Level 3 attestation — STANDARD tier
sbom:generate:    # Syft -> CycloneDX SBOM -> attach to image — STANDARD tier

Kyverno admission policy: Planned — only images with valid cosign signature and SBOM attestation will be permitted to deploy to production (see Phase 7).

4.4 Deployment (ArgoCD) — DONE¶

deploy:staging:       # ArgoCD API-triggered sync to staging namespace (auto on STANDARD)
deploy:production:    # ArgoCD sync to production namespace (manual gate on main)

Progressive delivery: Staging auto-deploys on STANDARD tier runs. Production requires manual approval after DAST passes.

4.5 Post-Deploy Verification — DONE¶

verify:health:      # HTTP health checks, SSL verification, error rate < 0.1% — now blocking
verify:smoke:       # Playwright smoke tests against production
verify:rollback:    # Verify rollback procedure is documented and tested

verify:health previously had allow_failure: true; this has been removed.

Phase 5: Multi-Project Queue Management (Week 5)¶

5.1 Queue Architecture¶

Project A commit -+
Project B commit -+-> ge:ci:queue (Redis Stream) -> CI Scheduler -> Runner Pods
Project C commit -+                                                     |
                                                              +---------+---------+
                                                              |                   |
                                                    Deterministic Jobs    Agent Jobs
                                                    (parallel runners)   (orchestrator)

5.2 Priority Queue¶

Priority	Trigger	Runner Allocation
P0 (hotfix)	`hotfix/*` branch	Preempt other jobs, dedicated runner
P1 (production)	`main` branch	50% runner capacity
P2 (staging)	`develop` branch	30% runner capacity
P3 (feature)	`feat/*` branches	20% runner capacity, queue if busy

5.3 Resource Maximization¶

Runner HPA: Scale runner pods 2-8 based on queue depth
Test sharding: Split test suites across N pods (Playwright sharding, pytest-xdist, vitest threads)
Dependency caching: PVC-backed cache for npm, pip, docker layers
Parallel stages: Lint, types, deps, SAST all run in parallel (no dependencies between them)
Per-project isolation: Each project gets its own k8s namespace for integration tests

5.4 Client/Project Isolation¶

Each client project gets: own GitLab group, own k8s namespace, own test database
Runner pods use nodeAffinity to schedule on fort-knox-dev
Resource quotas per namespace prevent one project starving others
Network policies isolate test namespaces from each other

Phase 6: E2E Testing Strategy (Weeks 5-6)¶

6.1 Tool Decision¶

Playwright remains primary for structured E2E testing: multi-browser support (Chromium, Firefox, WebKit), built-in test sharding, trace viewer for debugging, Test Agents feature (Planner/Generator/Healer), and MCP integration for AI-driven test creation.

Supplementary tools:

Stagehand (Browserbase) -- natural language browser automation for testing UIs agents have not seen before (e.g., third-party integrations)
Playwright MCP -- connects Playwright to Claude Code for intelligent test generation and self-healing

6.2 Test Pyramid¶

Level	Tool	Target Count	Speed	Runs When
Unit	Vitest / pytest	1000+	<30s	Every commit
Integration	Vitest + real services	200+	<2min	Every MR
Component	Playwright component	100+	<1min	Every MR
E2E	Playwright full	50+	<5min	Staging deploy
E2E (agentic)	Stagehand	10+	<3min	Pre-production
Visual regression	Playwright screenshots	All pages	<2min	Every MR
Load	k6	Key endpoints	<5min	Staging deploy
Chaos	Litmus	Core services	<10min	Weekly

Phase 7: Compliance Integration (Weeks 6-7)¶

7.1 ISO 27001 Evidence Generation¶

Every CI run automatically produces the following compliance evidence:

Control	Evidence	Format	Storage
A.8.25 (Secure SDLC)	Pipeline execution log	JSON	GitLab CI artifacts
A.8.28 (Secure coding)	SAST scan results	SARIF	GitLab CI artifacts + evidence repo
A.8.29 (Security testing)	DAST scan results	SARIF	GitLab CI artifacts + evidence repo
A.8.30 (Outsourced dev)	SBOM + license scan	CycloneDX + JSON	Harbor + evidence repo
A.8.9 (Configuration mgmt)	IaC scan results	SARIF	GitLab CI artifacts

7.2 SOC 2 Type II Evidence¶

Control	Evidence	Generated By
CC6.1 (Logical access)	MR approval logs, branch protection	GitLab audit log
CC7.2 (Incident detection)	Security scan findings	SAST/DAST SARIF
CC8.1 (Change management)	Pipeline results, MR history	Marta's merge gate report

7.3 Policy-as-Code¶

Files to create:

config/ci/policies/security.rego -- OPA policies for security gates
config/ci/policies/compliance.rego -- OPA policies for compliance checks
config/ci/policies/deployment.rego -- OPA policies for deployment gates

verify:policy:   # conftest verify all policies pass

7.4 Kyverno Admission Policies¶

Files to create:

k8s/base/ci/kyverno/require-image-signature.yaml
k8s/base/ci/kyverno/require-sbom-attestation.yaml
k8s/base/ci/kyverno/enforce-resource-limits.yaml
k8s/base/ci/kyverno/enforce-nonroot.yaml

Phase 8: Wiki Brain Updates (Week 7)¶

New Wiki Pages¶

Page	Content
`development/infrastructure/cicd-pipeline.md`	Complete pipeline architecture, stage descriptions, tool inventory
`development/infrastructure/cicd-queue-management.md`	Multi-project queue, priority system, resource allocation
`development/procedures/cicd-troubleshooting.md`	Common failures, debugging, gate overrides
`development/contracts/cicd-stages.md`	Contract per stage: inputs, outputs, thresholds, blocking rules
`development/integrations/harbor-registry.md`	Harbor setup, image signing, vulnerability scanning
`development/integrations/argocd.md`	ArgoCD setup, application definitions, rollback procedures
`development/pitfalls/cicd.md`	CI/CD specific pitfalls (flaky tests, cache invalidation, runner issues)
`domains/compliance/cicd-evidence.md`	Evidence generated, storage location, audit access

Updated Wiki Pages¶

Page	Update
`methodologies/anti-llm-pipeline/stages.md`	Add CI job mapping per stage
`domains/project-management/delivery-swimlanes.md`	Add CI job references per AV
`development/standards/testing.md`	Add mutation testing, property-based testing standards
`development/infrastructure/orchestrator.md`	Add CI Bridge documentation
`development/procedures/deploy-code.md`	Replace manual kubectl with ArgoCD

Phase 9: Agent Identity Updates (Week 7)¶

Agent	Update
Koen	Add CI stage ownership (lint, types, deps, SAST), policy-as-code reference
Marije / Judith	Add E2E tools (Playwright + Stagehand), CI integration test stage ownership
Marta / Iwona	Add merge gate CI job, SOC 2 evidence generation, release readiness scoring
Victoria	Add DAST stage ownership, Kyverno policy authoring, threat model to CI rule derivation
Ashley	Add adversarial CI stage, fuzzing harness ownership, load testing
Jaap	Add SSOT CI stage, policy-as-code verification
Jasper	Add reconciliation CI stage
Marco	Add conflict detection CI stage (AST diff tooling)
Alex / Tjitte	Add runner infrastructure ownership, Harbor management, ArgoCD operations
Arjan	Add GitOps workflow, environment provisioning
Pol	Add DAST pentest stage, Nuclei template authoring
Joshua	Add quarterly calibration CI metrics, pipeline pruning authority

Agent-to-CI-Stage Mapping¶

The following table maps every CI pipeline stage to its owning agent(s) and indicates whether the stage is deterministic or agent-powered.

CI Stage	Job(s)	Type	Owning Agent(s)
Lint	`lint:gitleaks`, `lint:python`, `lint:typescript`, `lint:deadcode`	Deterministic	Koen
Type Safety	`types:python`, `types:typescript`	Deterministic	Koen
Dependency Security	`deps:python`, `deps:node`, `deps:license`	Deterministic	Koen
IaC Security	`iac:checkov`, `iac:kubesec`	Deterministic	Koen
SAST	`sast:bandit`, `sast:semgrep`, `sast:custom`	Deterministic	Koen
TDD Gates	`tdd:red-gate`, `tdd:green-gate`, `tdd:oracle-check`	Deterministic	Koen
Unit Tests	`test:unit:python`, `test:unit:typescript`	Deterministic	Marije / Judith
Integration Tests	`test:integration`	Deterministic	Marije / Judith
E2E Tests	`test:e2e`	Deterministic + Agentic	Marije / Judith
Mutation Testing	`mutation:typescript`, `mutation:python`	Deterministic	Koen
Property-Based Testing	`test:property:python`, `test:property:typescript`	Deterministic	Ashley
Test Reconciliation	`test:reconciliation`	Agent-powered	Jasper
Adversarial Testing	`test:adversarial:fuzz`, `test:adversarial:injection`, `test:adversarial:load`	Hybrid	Ashley
SSOT Enforcement	`verify:ssot`	Agent-powered	Jaap
Merge Gate	`merge:gate`	Agent-powered	Marta / Iwona
DAST	`dast:zap`, `dast:nuclei`	Deterministic	Victoria / Pol
Container Build + Sign	`build:image`, `sign:image`, `sbom:generate`	Deterministic	Alex / Tjitte
Deployment	`deploy:staging`, `deploy:production`	Deterministic	Arjan / Alex
Post-Deploy Verification	`verify:health`, `verify:smoke`, `verify:rollback`	Deterministic	Marije / Judith
Policy Verification	`verify:policy`	Deterministic	Jaap

Implementation Order (Critical Path)¶

Week 1: Infrastructure — DONE
  +-- Container registry active (kaniko builds) (1.1)
  +-- CI pipeline active with 3-tier model, 31 jobs (1.2)
  +-- ArgoCD deployed and wired (1.3)
  +-- Custom runner image built (2.1 prereq)

Week 2: Deterministic Pipeline — DONE
  +-- Lint + Type + SAST jobs (2.1, 2.2)
  +-- TDD gates (2.3)
  +-- Unit + Integration + E2E tests (2.4)

Week 3: Anti-LLM Gates — DONE
  +-- Mutation testing incl. Stryker incremental (3.1)
  +-- Property-based testing (3.2)

Week 4: Agent-Powered Stages — DONE
  +-- Reconciliation via Jasper (3.3)
  +-- Adversarial via Ashley (3.4)
  +-- SSOT via Jaap (3.5)
  +-- Merge gate via Marta — reads JUnit/SARIF artifacts (4.1)

Week 5: Deployment + Container Supply Chain — DONE
  +-- DAST via ZAP + Nuclei (FULL tier) (4.2)
  +-- Container build via kaniko + cosign sign + Syft SBOM (4.3)
  +-- ArgoCD deployment (4.4)
  +-- Post-deploy health verification, now blocking (4.5)

Remaining work:
  +-- Agent-CI Bridge service (1.4) — PENDING
  +-- Multi-project queue (5.1-5.4) — PENDING
  +-- Stagehand agentic E2E (6.1) — PENDING
  +-- Chaos testing with Litmus — PENDING
  +-- Policy-as-code OPA/Rego (7.3) — PENDING
  +-- Kyverno admission policies (7.4) — PENDING
  +-- Agent identity updates (Phase 9) — PENDING

Verification¶

Per-Phase Verification¶

Each phase has its own test suite (contract tests first, TDD approach)
verify-executor-safety.sh must pass after each phase
Full 139-test regression suite must pass

End-to-End Verification¶

Push a commit -- webhook fires, GitLab Bridge converts, orchestrator routes
Deterministic jobs run in parallel (lint, types, SAST)
Agent-powered jobs dispatch and complete (Koen, Marije, Jasper)
All results aggregate in merge gate (Marta score threshold: 70/100)
Container builds, signs, pushes to Harbor
ArgoCD syncs to staging
DAST runs against staging
Production deploy (manual gate)
Post-deploy health checks pass
All compliance evidence archived

Capacity Test¶

Queue 10 MRs from different projects simultaneously
All complete within SLA (simple: <4h, standard: <8h)
Runner utilization >70% during peak
No resource starvation between projects

Key Files to Create¶

File	Purpose
`config/gitlab-ci-template.yaml`	Rewrite existing template with full pipeline
`config/ci/runner-image/Dockerfile`	Custom runner with all tools
`config/ci/policies/*.rego`	OPA security/compliance policies
`config/semgrep-rules/*.yaml`	Custom SAST rules for LLM anti-patterns
`ge_orchestrator/ci_bridge.py`	Agent results to GitLab pipeline status
`ge_orchestrator/ci_dispatch.py`	GitLab triggers to agent task dispatch
`scripts/check-security-findings.sh`	Severity gate enforcement (exists, needs update)
`k8s/base/ci/`	All CI infrastructure manifests
8 new wiki pages	Documentation (see Phase 8)
12 agent identity updates	Tool references (see Phase 9)