DOMAIN:INNOVATION — AGENTIC_ENGINEERING¶
OWNER: joshua
UPDATED: 2026-03-24
PURPOSE: track the state of agentic AI engineering, multi-agent frameworks, AI development tools, and GE's competitive positioning
SCOPE: landscape as of 2025-2026, updated as the field evolves
RELEVANCE: GE IS an agentic engineering company — this is not peripheral research, this is understanding our own field
AGENTIC:LANDSCAPE_OVERVIEW¶
DEFINITION: agentic AI engineering = building systems where LLM-powered agents autonomously plan, execute, and iterate on tasks with tool access and memory
TIMELINE: field emerged 2023 (AutoGPT hype), matured 2024 (frameworks stabilized), production-grade 2025-2026 (real companies shipping)
CURRENT_STATE: the field has bifurcated — generalist coding assistants (Cursor, Copilot) vs specialized multi-agent systems (GE, Factory, Cognition)
KEY_TREND: the market is realizing that single-agent systems hit capability ceilings — multi-agent coordination is where value compounds
AGENTIC:MULTI_AGENT_FRAMEWORKS¶
LANGGRAPH (LangChain)¶
CREATOR: LangChain (Harrison Chase)
WHAT: graph-based framework for building stateful multi-agent workflows
ARCHITECTURE: directed graphs with nodes (agents/tools) and edges (routing logic), built on LangChain primitives
STRENGTHS: excellent state management, human-in-the-loop patterns, persistence layer, streaming support, strong community
WEAKNESSES: tight coupling to LangChain ecosystem, Python-centric (JS port exists but lags), complexity overhead for simple workflows, verbose API
LLM_COMPATIBILITY: high — well-represented in training data, LangChain is the most-discussed framework
GE_RELEVANCE: conceptually similar to GE's DAG-based work package system, but GE uses Redis Streams + custom orchestrator instead of LangGraph's graph runtime
VERDICT: assess — worth understanding patterns, but GE's architecture is more specialized and production-hardened for our use case
CREWAI¶
CREATOR: CrewAI (Joao Moura)
WHAT: role-based multi-agent framework — define agents with roles, goals, and backstories, then assign them to tasks
ARCHITECTURE: crews of agents with defined roles, sequential or parallel task execution, built-in delegation
STRENGTHS: intuitive role metaphor, easy to prototype, good documentation, growing ecosystem
WEAKNESSES: limited state management, no built-in persistence, role definitions are superficial compared to GE's identity system, no native DAG enforcement
LLM_COMPATIBILITY: moderate — newer framework, LLMs know basics but hallucinate advanced patterns
GE_RELEVANCE: CrewAI's role metaphor is similar to GE's agent identity system, but GE's is far deeper (tiered identity: CORE + ROLE + REFERENCE, 59 agents with unique personalities and domain expertise)
VERDICT: hold — GE's system is more sophisticated, CrewAI is useful for rapid prototyping but not for production multi-agent orchestration
AUTOGEN (Microsoft)¶
CREATOR: Microsoft Research
WHAT: conversational multi-agent framework — agents communicate through message passing
ARCHITECTURE: agents as conversable entities, group chats, nested conversations, code execution in Docker
STRENGTHS: Microsoft backing, strong research foundation, flexible conversation patterns, good for research
WEAKNESSES: complex API surface, heavy configuration, conversation-based routing limits structured workflows, Microsoft-ecosystem bias
LLM_COMPATIBILITY: moderate — training data includes research papers and docs, but API has changed significantly across versions
GE_RELEVANCE: AutoGen's conversation model is interesting but fundamentally different from GE's stream-based inbox/outbox pattern — GE favors structured task dispatch over open-ended conversation
VERDICT: hold — architectural mismatch with GE's design philosophy
SEMANTIC_KERNEL (Microsoft)¶
CREATOR: Microsoft
WHAT: SDK for integrating AI into applications — plugins, planners, memory, connectors
ARCHITECTURE: kernel with plugins (native functions + LLM functions), planners for orchestration, memory stores
STRENGTHS: enterprise-grade, C#/.NET first-class support, good Azure integration, plugin ecosystem
WEAKNESSES: .NET-centric (Python support exists but secondary), enterprise complexity, tightly coupled to Azure/OpenAI
LLM_COMPATIBILITY: moderate — well-documented but C#-focused training data
GE_RELEVANCE: low — GE's stack is TypeScript/Python, not .NET — architectural patterns are interesting but implementation is not portable
VERDICT: hold — wrong ecosystem for GE
DSPY (Stanford)¶
CREATOR: Stanford NLP (Omar Khattab)
WHAT: framework for programming (not prompting) LLMs — declarative signatures, automatic prompt optimization
ARCHITECTURE: modules with typed signatures, teleprompters for optimization, assertions for validation
STRENGTHS: moves beyond prompt engineering to systematic optimization, reproducible, composable
WEAKNESSES: steep learning curve, Python-only, requires labeled data for optimization, less intuitive than direct prompting
LLM_COMPATIBILITY: limited — niche framework, LLMs need explicit guidance
GE_RELEVANCE: DSPy's optimization approach could improve GE's agent system prompts — worth assessing for prompt quality improvement
VERDICT: assess — the optimization methodology is interesting, even if the framework itself isn't directly adoptable
SWARM (OpenAI)¶
CREATOR: OpenAI (experimental)
WHAT: lightweight multi-agent orchestration — handoffs between agents, function calling, minimal abstraction
ARCHITECTURE: agents with instructions and functions, handoffs as return values, client manages conversation
STRENGTHS: extremely simple, lightweight, easy to understand, OpenAI-native
WEAKNESSES: experimental/educational (OpenAI says "not production-ready"), no persistence, no state management, minimal features
LLM_COMPATIBILITY: limited — very new, minimal training data
GE_RELEVANCE: Swarm's simplicity is instructive but GE needs production-grade orchestration (persistence, DAG, recovery, health monitoring) — Swarm is a toy by comparison
VERDICT: hold — educational value only
AGENTIC:MCP_PROTOCOL¶
WHAT: Model Context Protocol — open standard by Anthropic for connecting LLMs to external tools and data sources
ARCHITECTURE: client-server protocol where MCP servers expose tools/resources/prompts, and MCP clients (LLM applications) discover and invoke them
WHY_IT_MATTERS: standardizes tool integration — instead of every framework having custom tool definitions, MCP provides a universal interface
CURRENT_STATE: rapidly growing ecosystem, servers for GitHub, Slack, databases, file systems, web browsers, and hundreds more
ADOPTION: Claude Desktop, Claude Code, Cursor, Windsurf, Cline, Continue — major AI tools are adopting MCP
MCP_FOR_GE¶
OPPORTUNITY_1: GE's agents use Claude Code which supports MCP — agents could leverage MCP servers for standardized tool access
OPPORTUNITY_2: GE could expose its own services as MCP servers (wiki brain, discussion API, task dispatch) — making GE's infrastructure available to any MCP client
OPPORTUNITY_3: client project delivery could include MCP server setup — standardized API layer for client applications
RISK_1: MCP is still maturing — breaking changes possible
RISK_2: security model needs careful evaluation — MCP servers have broad access, must align with GE's ISO 27001 requirements
RISK_3: performance overhead of protocol layer vs direct API calls
GE_VERDICT: assess — monitor MCP ecosystem maturity, evaluate for GE integration when protocol stabilizes
RELEVANT_AGENTS: alexander (integration/Stitch), urszula/maxim (backend), joshua (evaluation)
AGENTIC:AI_CODE_GENERATION_TOOLS¶
CLAUDE_CODE (Anthropic)¶
WHAT: CLI-based AI coding agent — understands full codebase, executes commands, edits files, runs tests
GE_USAGE: primary execution engine for all GE agents — agent_runner.py wraps Claude Code with PTY capture
STRENGTHS: deep codebase understanding, tool use (bash, file edit, search), agentic loop, excellent code quality
WEAKNESSES: CLI-only (no IDE integration as primary mode), cost at scale, context window limits for very large codebases
COMPETITIVE_POSITION: best-in-class for agentic workflows, GE's entire execution model is built on it
GE_DEPENDENCY: critical — Claude Code is GE's execution substrate
CURSOR¶
WHAT: AI-native IDE (VS Code fork) with inline code generation, chat, and agentic features (Composer)
STRENGTHS: excellent DX, fast inline completions, good multi-file editing, growing agent mode
WEAKNESSES: IDE-bound (not headless), subscription pricing, closed-source, limited automation API
GE_RELEVANCE: not suitable for GE's headless agent execution model, but relevant for human developers using GE-built applications
COMPETITIVE_POSITION: dominant in AI-assisted coding for individual developers
WINDSURF (Codeium)¶
WHAT: AI IDE with "Cascade" agentic mode — multi-step coding with tool use
STRENGTHS: good agentic capabilities, competitive pricing, improving rapidly
WEAKNESSES: smaller community than Cursor, less polished, still catching up on features
GE_RELEVANCE: same limitation as Cursor — IDE-bound, not headless — monitor for innovations in agentic patterns
COMPETITIVE_POSITION: strong challenger to Cursor
GITHUB_COPILOT¶
WHAT: AI pair programmer — inline completions, chat, workspace mode for multi-file tasks
STRENGTHS: massive adoption (largest install base), GitHub integration, enterprise features, free tier
WEAKNESSES: completions-focused (weaker at agentic workflows), OpenAI-dependent, workspace mode still maturing
GE_RELEVANCE: Copilot Workspace's multi-file editing approach is worth monitoring — could influence how GE agents structure edits
COMPETITIVE_POSITION: market leader by install base, but lagging in agentic capabilities
DEVIN (Cognition)¶
WHAT: autonomous AI software engineer — full environment (browser, terminal, editor), plans and executes tasks
STRENGTHS: truly autonomous (handles entire tasks end-to-end), impressive demos, strong funding
WEAKNESSES: expensive, slow (hours for tasks), quality inconsistent, black-box execution, limited customization
GE_RELEVANCE: direct competitor conceptually, but fundamentally different — Devin is a single generalist agent, GE is 59 specialized agents with a self-learning brain
COMPETITIVE_POSITION: high visibility but mixed real-world results — the "impressive demo, disappointing reality" problem
GE_ADVANTAGE_OVER_DEVIN: specialization (59 domain experts vs 1 generalist), self-learning (wiki brain evolves), quality pipeline (TDD-first, adversarial testing), human governance (discussions, constitution)
FACTORY (Factory AI)¶
WHAT: AI development platform — code generation, review, testing, deployment in a unified pipeline
STRENGTHS: enterprise focus, pipeline approach (not just code generation), strong team
WEAKNESSES: less public information, enterprise-only positioning, limited public benchmarks
GE_RELEVANCE: closest competitor in philosophy (pipeline approach, multiple capabilities) — worth monitoring closely
COMPETITIVE_POSITION: enterprise-focused, less accessible than GE's SME target
CODEGEN¶
WHAT: AI coding agent platform — codebase understanding, automated PRs, issue resolution
STRENGTHS: good codebase understanding, GitHub integration, automated workflows
WEAKNESSES: limited to code tasks, no design/media pipeline, smaller scope than GE
GE_RELEVANCE: competitor in automated code generation, but GE offers full agency (design, content, testing, deployment)
AGENTIC:AI_DESIGN_TOOLS¶
GOOGLE_STITCH¶
WHAT: design-to-code tool — generates production-ready React components from design input
GE_USAGE: integrated into GE's pipeline via Alexander (Stitch integration agent)
STRENGTHS: high-quality React output, respects design systems, production-ready code
WEAKNESSES: Google ecosystem dependency, limited to React, evolving rapidly
GE_STATUS: trial — actively used in pipeline, see ge-ops/wiki/docs/development/integrations/google-stitch.md
INTEGRATION_DOC: wiki/docs/development/integrations/google-stitch.md
V0 (Vercel)¶
WHAT: AI UI generation — natural language to React components with shadcn/ui and Tailwind
STRENGTHS: excellent shadcn/ui integration (same team), fast iteration, good quality, free tier
WEAKNESSES: Vercel lock-in for deployment, limited to frontend components, no full-page generation
GE_RELEVANCE: high — GE uses shadcn/ui and Tailwind, v0's output is directly compatible with GE's frontend stack
GE_STATUS: assess — evaluate for agent-assisted UI prototyping
BOLT (StackBlitz)¶
WHAT: AI full-stack app builder — generates complete applications from prompts
STRENGTHS: full-stack generation, live preview, rapid prototyping, WebContainer technology
WEAKNESSES: generated code quality varies, limited customization, not suitable for enterprise patterns
GE_RELEVANCE: competitor for simple applications, but GE targets custom enterprise-grade SaaS — different market segment
COMPETITIVE_POSITION: threat at the low end (simple apps), not competitive for GE's target complexity
LOVABLE¶
WHAT: AI app builder — generates full-stack applications with database, auth, and deployment
STRENGTHS: impressive full-stack generation, Supabase integration, fast time-to-prototype
WEAKNESSES: limited architectural control, generated code may not meet enterprise standards, Supabase dependency
GE_RELEVANCE: same as Bolt — competitor for simple applications, not for GE's enterprise-grade target
COMPETITIVE_POSITION: strong for prototypes and MVPs, not competitive for production enterprise SaaS
AGENTIC:GE_COMPETITIVE_ANALYSIS¶
WHAT_MAKES_GE_UNIQUE¶
DIFFERENTIATOR_1: SELF_LEARNING_BRAIN
- GE's wiki brain captures learnings from every agent session, synthesizes patterns, and injects them at boot
- no competitor has this — they all start from scratch or static configuration
- the brain compounds: every session makes every future session better
- implementation: session_summarizer.py (per-session) + knowledge_synthesizer.py (cross-session) + wiki (persistent)
DIFFERENTIATOR_2: 59_SPECIALIZED_AGENTS
- GE simulates a full human agency: account managers, designers, developers, testers, reviewers, deployers
- each agent has deep domain expertise encoded in tiered identity (CORE + ROLE + REFERENCE)
- competitors use 1 generalist agent or 3-5 loosely-defined agents
- specialization means higher quality per task — a testing agent knows testing patterns a generalist does not
DIFFERENTIATOR_3: ANTI_LLM_PIPELINE
- GE's quality pipeline is designed to catch LLM failure modes: Anna(spec) → Antje(TDD) → Devs → Koen(linting) → Marije(test) → Jasper(reconcile) → Marco(conflict) → Ashley(adversarial) → Jaap(SSOT) → Marta(merge)
- TDD-first means agents write tests before code — catches hallucination
- adversarial testing (Ashley) specifically targets LLM-generated code weaknesses
- SSOT reconciliation (Jaap) ensures final output matches specification
- no competitor has this level of systematic quality assurance
DIFFERENTIATOR_4: MULTI_PROVIDER_STRATEGY
- GE uses Claude, OpenAI, and Gemini — best model for each agent type
- no single-vendor dependency — if one provider has an outage or price increase, GE can shift load
- agents are provider-agnostic at the task level — same interface regardless of underlying model
DIFFERENTIATOR_5: HUMAN_GOVERNANCE
- constitution with 10 principles, discussion consensus for decisions, Dirk-Jan approval for major changes
- agents have autonomy within guardrails, not unchecked autonomy
- competitors either give full autonomy (risky) or require constant human oversight (slow)
- GE's model: trust but verify, with the brain learning from verification outcomes
DIFFERENTIATOR_6: ENTERPRISE_AT_SME_PRICE
- mission: make enterprise-grade custom SaaS accessible to SME business owners at intern price point
- hyperscalable architecture: built for 1 user, auto-scales to 100k without migration
- ISO 27001, SOC 2 Type II compliance — enterprise security at SME cost
- competitors either target enterprise (expensive) or SME (low quality) — GE does both
COMPETITOR_COMPARISON_MATRIX¶
| Dimension | GE | Devin | Factory | Cursor | Copilot | Bolt/Lovable |
|---|---|---|---|---|---|---|
| Agent Count | 59 | 1 | ~5 | 1 | 1 | 1 |
| Self-Learning | yes (brain) | no | limited | no | no | no |
| Quality Pipeline | 10-stage | basic | some | none | none | none |
| Multi-Provider | yes (3) | no (1) | unknown | yes (2) | no (1) | varies |
| Specialization | deep | generalist | moderate | generalist | generalist | generalist |
| Enterprise Ready | yes (ISO/SOC) | limited | yes | no | yes | no |
| Target Market | SME | enterprise | enterprise | developers | developers | individuals |
| Price Model | per-project | per-task | enterprise | subscription | subscription | freemium |
| Full Agency | yes | code only | code focus | code only | code only | app gen |
| Design Pipeline | yes | no | no | no | no | basic |
| Content Pipeline | yes | no | no | no | no | no |
| Media Pipeline | yes (video/image) | no | no | no | no | no |
AGENTIC:TRENDS_TO_WATCH¶
TREND_1: AGENT_SPECIALIZATION
- market moving from "one agent does everything" to specialized agents for specific tasks
- validates GE's architecture — we were early to this pattern
- watch: how competitors implement specialization (will they reach 59 agents? unlikely without GE's brain)
TREND_2: TOOL_USE_STANDARDIZATION
- MCP, function calling, tool schemas becoming standardized
- reduces integration cost for new tools
- watch: MCP adoption rate, competing standards
TREND_3: MEMORY_AND_LEARNING
- every framework is adding memory/persistence — the market recognizes this matters
- GE's three-layer knowledge system (session → pattern → wiki) is ahead
- watch: LangGraph persistence, CrewAI memory, MemGPT/Letta approaches
TREND_4: EVALUATION_AND_BENCHMARKING
- SWE-bench, HumanEval, MBPP becoming standard — but they test individual completion, not multi-agent workflows
- no good benchmarks for multi-agent software development yet
- opportunity: GE could define the benchmark (if strategically valuable)
TREND_5: COST_OPTIMIZATION
- token costs are the bottleneck for agentic systems at scale
- GE's cost_gate.py and token burn prevention rules are ahead of the curve
- watch: model pricing trends, distillation techniques, local model viability for routine tasks
TREND_6: REGULATION
- EU AI Act implications for agentic systems
- transparency requirements align with GE's constitution (principle 10)
- watch: how regulation affects agentic development tools, compliance requirements for AI-generated code