DOMAIN:INFRASTRUCTURE:THOUGHT_LEADERS¶
OWNER: arjan (infrastructure), gerco (k3s/sysadmin), thijmen (k8s Zone 2), rutger (k8s Zone 3) UPDATED: 2026-03-24 SCOPE: key people, organizations, and resources for infrastructure, Kubernetes, and IaC
LEADERS:KUBERNETES¶
KELSEY_HIGHTOWER¶
ROLE: former Google Cloud Developer Advocate, Kubernetes evangelist (retired 2023) GITHUB: github.com/kelseyhightower WHY_RELEVANT: defined how the industry thinks about Kubernetes. His "Kubernetes The Hard Way" tutorial is the gold standard for understanding k8s internals — not just using kubectl, but understanding what happens underneath. KEY_CONTRIBUTIONS: - "Kubernetes The Hard Way" — step-by-step manual cluster setup (no automation). Essential for understanding k8s architecture. - "Tetris" — serverless on k8s demo that showed the power of CRDs - Keynotes at KubeCon, Google Cloud Next — shaped k8s adoption narrative - Advocated for simplicity: "Kubernetes is a platform for building platforms" LEARN_FROM: his GitHub repos and conference talks — focus on WHY decisions are made, not just HOW QUOTE: "No one wakes up wanting to deploy Kubernetes. They want to ship their product." RELEVANCE_TO_GE: GE agents should understand k8s internals (not just kubectl copy-paste) — Kelsey's material is the starting point
BRENDAN_BURNS¶
ROLE: co-founder of Kubernetes, Corporate VP at Microsoft Azure BOOK: "Kubernetes Up & Running" (O'Reilly, co-authored with Joe Beda & Kelsey Hightower) WHY_RELEVANT: literally created Kubernetes (with Joe Beda and Craig McLuckie at Google). His book is the canonical reference. KEY_CONTRIBUTIONS: - Co-created Kubernetes (open-sourced from Google's Borg/Omega experience) - "Kubernetes Up & Running" — the definitive book (4th edition, 2024) - "Designing Distributed Systems" — patterns for container-based systems - Led Kubernetes from Google internal project to CNCF donation LEARN_FROM: "Kubernetes Up & Running" for operational patterns, "Designing Distributed Systems" for architecture RELEVANCE_TO_GE: his patterns for multi-node systems directly apply to GE's three-zone architecture
JOE_BEDA¶
ROLE: co-founder of Kubernetes, co-founder of Heptio (acquired by VMware) WHY_RELEVANT: co-created k8s and later founded Heptio to make k8s enterprise-ready. His focus on operational excellence aligns with GE's production-first mindset. LEARN_FROM: Heptio's approach to making k8s enterprise-grade — same challenge GE faces
TIM_HOCKIN¶
ROLE: Principal Software Engineer at Google, Kubernetes SIG lead GITHUB: github.com/thockin WHY_RELEVANT: deep k8s networking expert. His presentations on k8s networking internals are the best resource for understanding Services, DNS, CNI, and NetworkPolicies. KEY_CONTRIBUTIONS: - K8s networking architecture (Services, kube-proxy, DNS) - K8s SIG-Network leadership - Extensive technical talks on k8s internals LEARN_FROM: his KubeCon talks on k8s networking — essential for stef and anyone debugging network issues
LEADERS:K3S_AND_RANCHER¶
DARREN_SHEPHERD¶
ROLE: co-founder of Rancher Labs, creator of k3s GITHUB: github.com/ibuildthecloud WHY_RELEVANT: created k3s — the lightweight k8s distribution GE uses in Zone 1. Understanding his design decisions helps understand k3s limitations and strengths. KEY_CONTRIBUTIONS: - Created k3s — Kubernetes for edge/IoT/resource-constrained environments - Co-founded Rancher Labs (acquired by SUSE) - RancherOS — minimal Linux for containers PHILOSOPHY: "Kubernetes should be simple enough to run on a Raspberry Pi" RELEVANCE_TO_GE: k3s on Minisforum 790 Pro is GE's Zone 1 — Darren's design decisions directly affect GE's dev environment
RANCHER_DOCS¶
URL: docs.k3s.io WHY_RELEVANT: official k3s documentation — primary reference for Zone 1 operations KEY_SECTIONS: - Installation and configuration - Networking (Flannel CNI, Traefik ingress) - Storage (local-path provisioner) - Upgrades and maintenance - Known limitations vs full k8s RULE: when something behaves differently in k3s vs full k8s, check k3s docs first
LEADERS:HASHICORP_TERRAFORM¶
MITCHELL_HASHIMOTO¶
ROLE: co-founder of HashiCorp (stepped down as CTO 2023) GITHUB: github.com/mitchellh WHY_RELEVANT: created Terraform (and Vagrant, Consul, Vault). His philosophy of "infrastructure as code" is the foundation of arjan's entire workflow. KEY_CONTRIBUTIONS: - Created Terraform — IaC industry standard - Created HashiCorp Vault — GE's secrets management - "Tao of HashiCorp" — guiding principles for infrastructure tools PHILOSOPHY: "Workflows, not technologies. Simple, modular, composable." LEARN_FROM: "Tao of HashiCorp" — principles that shaped Terraform's design
ARMON_DADGAR¶
ROLE: co-founder of HashiCorp, CTO WHY_RELEVANT: co-designed Terraform's architecture and state management. His talks on Terraform best practices are canonical. LEARN_FROM: HashiConf keynotes on infrastructure lifecycle management
HASHICORP_LEARN¶
URL: developer.hashicorp.com/terraform WHY_RELEVANT: official Terraform tutorials and documentation — primary reference for arjan KEY_SECTIONS: - Terraform language reference (HCL) - Provider documentation (UpCloud, TransIP, BunnyCDN) - State management patterns - Module development best practices - Workspace strategies for multi-environment
TERRAFORM_BEST_PRACTICES¶
REPO: github.com/ozbillwang/terraform-best-practices WHY_RELEVANT: community-curated Terraform patterns, module structure, naming conventions, state management ALSO: github.com/antonbabenko — Anton Babenko's Terraform modules and talks are industry reference
LEADERS:UPCLOUD¶
UPCLOUD_DOCUMENTATION¶
URL: upcloud.com/docs WHY_RELEVANT: GE's cloud provider for Zone 2 and Zone 3 — primary reference for all UpCloud resources KEY_SECTIONS: - Managed Kubernetes documentation - Managed Database (PostgreSQL) documentation - API reference (used by Terraform provider) - Network and firewall documentation - Object Storage (S3-compatible) TERRAFORM_PROVIDER_DOCS: registry.terraform.io/providers/UpCloudLtd/upcloud/latest/docs
UPCLOUD_DIFFERENTIATORS¶
WHY_GE_CHOSE_UPCLOUD: - EU-based company (Helsinki, Finland) — data sovereignty alignment - European data centers: Frankfurt, Amsterdam, Helsinki, Warsaw - Managed Kubernetes with full API access - Managed PostgreSQL with automated backups - Competitive pricing for SME workloads - No US CLOUD Act jurisdiction concerns RULE: if evaluating alternatives, EU data sovereignty is NON-NEGOTIABLE
LEADERS:CNCF_LANDSCAPE¶
CLOUD_NATIVE_COMPUTING_FOUNDATION¶
URL: landscape.cncf.io WHY_RELEVANT: CNCF maintains the ecosystem GE operates in — Kubernetes, Prometheus, cert-manager, Traefik (graduated/incubating projects)
KEY_CNCF_PROJECTS_USED_BY_GE: | Project | Status | GE Usage | |---|---|---| | Kubernetes | Graduated | Core orchestration (k3s + UpCloud MKE) | | Prometheus | Graduated | Metrics collection and alerting | | Helm | Graduated | Package management for k8s | | cert-manager | Incubating | TLS certificate automation | | Traefik | Incubating | Ingress controller | | Loki | Sandbox (Grafana Labs) | Log aggregation | | Grafana | Sandbox (Grafana Labs) | Visualization and dashboards |
CNCF_TRAIL_MAP¶
URL: github.com/cncf/landscape#trail-map PURPOSE: recommended adoption path for cloud-native technologies GE_STATUS: containerization (done) -> CI/CD (done) -> orchestration (done) -> observability (done) -> service mesh (deferred) -> security (ongoing)
LEADERS:CONTAINERS_AND_RUNTIME¶
DOCKER¶
URL: docs.docker.com WHY_RELEVANT: GE uses Docker for image building (Zone 1). k3s uses containerd runtime. KEY_REFERENCE: Dockerfile best practices — multi-stage builds, layer caching, security scanning RULE: images should be minimal (Alpine-based or distroless when possible)
OCI (Open Container Initiative)¶
URL: opencontainers.org WHY_RELEVANT: OCI defines image and runtime specs that Docker, containerd, and k3s all implement LEARN_FROM: understanding OCI explains why images are portable across Zone 1 (k3s) and Zone 2/3 (UpCloud MKE)
RESOURCES:BOOKS¶
| Book | Author | Relevance |
|---|---|---|
| Kubernetes Up & Running (4th ed) | Burns, Beda, Hightower | Canonical k8s reference |
| Kubernetes Patterns (2nd ed) | Ibryam, Huss | Design patterns for k8s workloads |
| Terraform: Up & Running (3rd ed) | Brikman | Terraform best practices, module design |
| Designing Distributed Systems | Brendan Burns | Container-based system patterns |
| The Phoenix Project | Kim, Behr, Spafford | DevOps culture and principles |
| Site Reliability Engineering | Google SRE team | Production operations, SLOs, incident response |
| Infrastructure as Code (2nd ed) | Kief Morris | IaC patterns beyond just Terraform |
| Cloud Native Infrastructure | Hightower, et al | Infrastructure management in cloud-native world |
RESOURCES:CONFERENCES¶
| Conference | Focus | Why |
|---|---|---|
| KubeCon + CloudNativeCon | Kubernetes, CNCF ecosystem | Primary industry event, bleeding-edge k8s |
| HashiConf | Terraform, Vault, Consul | Best practices for IaC and secrets |
| FOSDEM | Open source, infrastructure | European focus, free, community-driven |
| DevOpsDays Amsterdam | DevOps culture, tools | Local to GE (Netherlands), networking |
| SREcon | Site reliability, production ops | Google SRE practices, incident management |
RESOURCES:NEWSLETTERS_AND_BLOGS¶
| Source | Focus | Frequency |
|---|---|---|
| KubeWeekly (CNCF) | Kubernetes news and tutorials | Weekly |
| DevOps Weekly | Tools, practices, culture | Weekly |
| docs.k3s.io/blog | k3s updates and guides | On release |
| upcloud.com/blog | UpCloud product updates | Weekly |
| learnk8s.io/blog | Practical k8s tutorials | Monthly |
| thenewstack.io | Cloud native ecosystem | Daily |
| hashicorp.com/blog | Terraform, Vault updates | Weekly |
RESOURCES:REPOSITORIES¶
| Repository | Why | Stars |
|---|---|---|
| kelseyhightower/kubernetes-the-hard-way | Deep k8s understanding | 40k+ |
| k3s-io/k3s | GE's Zone 1 runtime — read issues for gotchas | 28k+ |
| UpCloudLtd/terraform-provider-upcloud | GE's cloud Terraform provider | ~100 |
| hashicorp/terraform | IaC tool — core reference | 42k+ |
| traefik/traefik | GE's ingress controller | 51k+ |
| cert-manager/cert-manager | TLS automation | 12k+ |
| prometheus/prometheus | Metrics and alerting | 55k+ |
| grafana/loki | Log aggregation | 24k+ |
| borgbackup/borg | Backup tool (dedup + encryption) | 11k+ |
RESOURCES:CERTIFICATIONS¶
| Certification | Provider | Relevance |
|---|---|---|
| CKA (Certified Kubernetes Administrator) | CNCF | Validates k8s operations skills |
| CKAD (Certified Kubernetes App Developer) | CNCF | Validates k8s workload design |
| CKS (Certified Kubernetes Security) | CNCF | Validates k8s security hardening |
| HashiCorp Certified: Terraform Associate | HashiCorp | Validates Terraform proficiency |
NOTE: while agents do not take certifications, the exam objectives serve as excellent skill checklists for what infrastructure agents should know
LEADERS:AGENTIC_NOTE¶
FOR_AGENTS_WORKING_ON_INFRASTRUCTURE: - When debugging k8s, check k3s docs first (Zone 1 may behave differently from full k8s) - When writing Terraform, check UpCloud provider docs for resource-specific options - When designing backup strategies, reference Google SRE book chapters on data integrity - When troubleshooting networking, reference Tim Hockin's k8s networking talks - PREFER official docs over blog posts — infrastructure blog posts go stale fast - PREFER k3s-specific solutions over generic k8s solutions for Zone 1 - NEVER implement a pattern from a blog post without checking the version — k8s APIs change frequently