Skip to content

DOMAIN:INNOVATION — AGENTIC_ENGINEERING

OWNER: joshua
UPDATED: 2026-03-24
PURPOSE: track the state of agentic AI engineering, multi-agent frameworks, AI development tools, and GE's competitive positioning
SCOPE: landscape as of 2025-2026, updated as the field evolves
RELEVANCE: GE IS an agentic engineering company — this is not peripheral research, this is understanding our own field


AGENTIC:LANDSCAPE_OVERVIEW

DEFINITION: agentic AI engineering = building systems where LLM-powered agents autonomously plan, execute, and iterate on tasks with tool access and memory
TIMELINE: field emerged 2023 (AutoGPT hype), matured 2024 (frameworks stabilized), production-grade 2025-2026 (real companies shipping)
CURRENT_STATE: the field has bifurcated — generalist coding assistants (Cursor, Copilot) vs specialized multi-agent systems (GE, Factory, Cognition)
KEY_TREND: the market is realizing that single-agent systems hit capability ceilings — multi-agent coordination is where value compounds


AGENTIC:MULTI_AGENT_FRAMEWORKS

LANGGRAPH (LangChain)

CREATOR: LangChain (Harrison Chase)
WHAT: graph-based framework for building stateful multi-agent workflows
ARCHITECTURE: directed graphs with nodes (agents/tools) and edges (routing logic), built on LangChain primitives
STRENGTHS: excellent state management, human-in-the-loop patterns, persistence layer, streaming support, strong community
WEAKNESSES: tight coupling to LangChain ecosystem, Python-centric (JS port exists but lags), complexity overhead for simple workflows, verbose API
LLM_COMPATIBILITY: high — well-represented in training data, LangChain is the most-discussed framework
GE_RELEVANCE: conceptually similar to GE's DAG-based work package system, but GE uses Redis Streams + custom orchestrator instead of LangGraph's graph runtime
VERDICT: assess — worth understanding patterns, but GE's architecture is more specialized and production-hardened for our use case

CREWAI

CREATOR: CrewAI (Joao Moura)
WHAT: role-based multi-agent framework — define agents with roles, goals, and backstories, then assign them to tasks
ARCHITECTURE: crews of agents with defined roles, sequential or parallel task execution, built-in delegation
STRENGTHS: intuitive role metaphor, easy to prototype, good documentation, growing ecosystem
WEAKNESSES: limited state management, no built-in persistence, role definitions are superficial compared to GE's identity system, no native DAG enforcement
LLM_COMPATIBILITY: moderate — newer framework, LLMs know basics but hallucinate advanced patterns
GE_RELEVANCE: CrewAI's role metaphor is similar to GE's agent identity system, but GE's is far deeper (tiered identity: CORE + ROLE + REFERENCE, 59 agents with unique personalities and domain expertise)
VERDICT: hold — GE's system is more sophisticated, CrewAI is useful for rapid prototyping but not for production multi-agent orchestration

AUTOGEN (Microsoft)

CREATOR: Microsoft Research
WHAT: conversational multi-agent framework — agents communicate through message passing
ARCHITECTURE: agents as conversable entities, group chats, nested conversations, code execution in Docker
STRENGTHS: Microsoft backing, strong research foundation, flexible conversation patterns, good for research
WEAKNESSES: complex API surface, heavy configuration, conversation-based routing limits structured workflows, Microsoft-ecosystem bias
LLM_COMPATIBILITY: moderate — training data includes research papers and docs, but API has changed significantly across versions
GE_RELEVANCE: AutoGen's conversation model is interesting but fundamentally different from GE's stream-based inbox/outbox pattern — GE favors structured task dispatch over open-ended conversation
VERDICT: hold — architectural mismatch with GE's design philosophy

SEMANTIC_KERNEL (Microsoft)

CREATOR: Microsoft
WHAT: SDK for integrating AI into applications — plugins, planners, memory, connectors
ARCHITECTURE: kernel with plugins (native functions + LLM functions), planners for orchestration, memory stores
STRENGTHS: enterprise-grade, C#/.NET first-class support, good Azure integration, plugin ecosystem
WEAKNESSES: .NET-centric (Python support exists but secondary), enterprise complexity, tightly coupled to Azure/OpenAI
LLM_COMPATIBILITY: moderate — well-documented but C#-focused training data
GE_RELEVANCE: low — GE's stack is TypeScript/Python, not .NET — architectural patterns are interesting but implementation is not portable
VERDICT: hold — wrong ecosystem for GE

DSPY (Stanford)

CREATOR: Stanford NLP (Omar Khattab)
WHAT: framework for programming (not prompting) LLMs — declarative signatures, automatic prompt optimization
ARCHITECTURE: modules with typed signatures, teleprompters for optimization, assertions for validation
STRENGTHS: moves beyond prompt engineering to systematic optimization, reproducible, composable
WEAKNESSES: steep learning curve, Python-only, requires labeled data for optimization, less intuitive than direct prompting
LLM_COMPATIBILITY: limited — niche framework, LLMs need explicit guidance
GE_RELEVANCE: DSPy's optimization approach could improve GE's agent system prompts — worth assessing for prompt quality improvement
VERDICT: assess — the optimization methodology is interesting, even if the framework itself isn't directly adoptable

SWARM (OpenAI)

CREATOR: OpenAI (experimental)
WHAT: lightweight multi-agent orchestration — handoffs between agents, function calling, minimal abstraction
ARCHITECTURE: agents with instructions and functions, handoffs as return values, client manages conversation
STRENGTHS: extremely simple, lightweight, easy to understand, OpenAI-native
WEAKNESSES: experimental/educational (OpenAI says "not production-ready"), no persistence, no state management, minimal features
LLM_COMPATIBILITY: limited — very new, minimal training data
GE_RELEVANCE: Swarm's simplicity is instructive but GE needs production-grade orchestration (persistence, DAG, recovery, health monitoring) — Swarm is a toy by comparison
VERDICT: hold — educational value only


AGENTIC:MCP_PROTOCOL

WHAT: Model Context Protocol — open standard by Anthropic for connecting LLMs to external tools and data sources
ARCHITECTURE: client-server protocol where MCP servers expose tools/resources/prompts, and MCP clients (LLM applications) discover and invoke them
WHY_IT_MATTERS: standardizes tool integration — instead of every framework having custom tool definitions, MCP provides a universal interface
CURRENT_STATE: rapidly growing ecosystem, servers for GitHub, Slack, databases, file systems, web browsers, and hundreds more
ADOPTION: Claude Desktop, Claude Code, Cursor, Windsurf, Cline, Continue — major AI tools are adopting MCP

MCP_FOR_GE

OPPORTUNITY_1: GE's agents use Claude Code which supports MCP — agents could leverage MCP servers for standardized tool access
OPPORTUNITY_2: GE could expose its own services as MCP servers (wiki brain, discussion API, task dispatch) — making GE's infrastructure available to any MCP client
OPPORTUNITY_3: client project delivery could include MCP server setup — standardized API layer for client applications
RISK_1: MCP is still maturing — breaking changes possible
RISK_2: security model needs careful evaluation — MCP servers have broad access, must align with GE's ISO 27001 requirements
RISK_3: performance overhead of protocol layer vs direct API calls

GE_VERDICT: assess — monitor MCP ecosystem maturity, evaluate for GE integration when protocol stabilizes
RELEVANT_AGENTS: alexander (integration/Stitch), urszula/maxim (backend), joshua (evaluation)


AGENTIC:AI_CODE_GENERATION_TOOLS

CLAUDE_CODE (Anthropic)

WHAT: CLI-based AI coding agent — understands full codebase, executes commands, edits files, runs tests
GE_USAGE: primary execution engine for all GE agents — agent_runner.py wraps Claude Code with PTY capture
STRENGTHS: deep codebase understanding, tool use (bash, file edit, search), agentic loop, excellent code quality
WEAKNESSES: CLI-only (no IDE integration as primary mode), cost at scale, context window limits for very large codebases
COMPETITIVE_POSITION: best-in-class for agentic workflows, GE's entire execution model is built on it
GE_DEPENDENCY: critical — Claude Code is GE's execution substrate

CURSOR

WHAT: AI-native IDE (VS Code fork) with inline code generation, chat, and agentic features (Composer)
STRENGTHS: excellent DX, fast inline completions, good multi-file editing, growing agent mode
WEAKNESSES: IDE-bound (not headless), subscription pricing, closed-source, limited automation API
GE_RELEVANCE: not suitable for GE's headless agent execution model, but relevant for human developers using GE-built applications
COMPETITIVE_POSITION: dominant in AI-assisted coding for individual developers

WINDSURF (Codeium)

WHAT: AI IDE with "Cascade" agentic mode — multi-step coding with tool use
STRENGTHS: good agentic capabilities, competitive pricing, improving rapidly
WEAKNESSES: smaller community than Cursor, less polished, still catching up on features
GE_RELEVANCE: same limitation as Cursor — IDE-bound, not headless — monitor for innovations in agentic patterns
COMPETITIVE_POSITION: strong challenger to Cursor

GITHUB_COPILOT

WHAT: AI pair programmer — inline completions, chat, workspace mode for multi-file tasks
STRENGTHS: massive adoption (largest install base), GitHub integration, enterprise features, free tier
WEAKNESSES: completions-focused (weaker at agentic workflows), OpenAI-dependent, workspace mode still maturing
GE_RELEVANCE: Copilot Workspace's multi-file editing approach is worth monitoring — could influence how GE agents structure edits
COMPETITIVE_POSITION: market leader by install base, but lagging in agentic capabilities

DEVIN (Cognition)

WHAT: autonomous AI software engineer — full environment (browser, terminal, editor), plans and executes tasks
STRENGTHS: truly autonomous (handles entire tasks end-to-end), impressive demos, strong funding
WEAKNESSES: expensive, slow (hours for tasks), quality inconsistent, black-box execution, limited customization
GE_RELEVANCE: direct competitor conceptually, but fundamentally different — Devin is a single generalist agent, GE is 59 specialized agents with a self-learning brain
COMPETITIVE_POSITION: high visibility but mixed real-world results — the "impressive demo, disappointing reality" problem
GE_ADVANTAGE_OVER_DEVIN: specialization (59 domain experts vs 1 generalist), self-learning (wiki brain evolves), quality pipeline (TDD-first, adversarial testing), human governance (discussions, constitution)

FACTORY (Factory AI)

WHAT: AI development platform — code generation, review, testing, deployment in a unified pipeline
STRENGTHS: enterprise focus, pipeline approach (not just code generation), strong team
WEAKNESSES: less public information, enterprise-only positioning, limited public benchmarks
GE_RELEVANCE: closest competitor in philosophy (pipeline approach, multiple capabilities) — worth monitoring closely
COMPETITIVE_POSITION: enterprise-focused, less accessible than GE's SME target

CODEGEN

WHAT: AI coding agent platform — codebase understanding, automated PRs, issue resolution
STRENGTHS: good codebase understanding, GitHub integration, automated workflows
WEAKNESSES: limited to code tasks, no design/media pipeline, smaller scope than GE
GE_RELEVANCE: competitor in automated code generation, but GE offers full agency (design, content, testing, deployment)


AGENTIC:AI_DESIGN_TOOLS

GOOGLE_STITCH

WHAT: design-to-code tool — generates production-ready React components from design input
GE_USAGE: integrated into GE's pipeline via Alexander (Stitch integration agent)
STRENGTHS: high-quality React output, respects design systems, production-ready code
WEAKNESSES: Google ecosystem dependency, limited to React, evolving rapidly
GE_STATUS: trial — actively used in pipeline, see ge-ops/wiki/docs/development/integrations/google-stitch.md
INTEGRATION_DOC: wiki/docs/development/integrations/google-stitch.md

V0 (Vercel)

WHAT: AI UI generation — natural language to React components with shadcn/ui and Tailwind
STRENGTHS: excellent shadcn/ui integration (same team), fast iteration, good quality, free tier
WEAKNESSES: Vercel lock-in for deployment, limited to frontend components, no full-page generation
GE_RELEVANCE: high — GE uses shadcn/ui and Tailwind, v0's output is directly compatible with GE's frontend stack
GE_STATUS: assess — evaluate for agent-assisted UI prototyping

BOLT (StackBlitz)

WHAT: AI full-stack app builder — generates complete applications from prompts
STRENGTHS: full-stack generation, live preview, rapid prototyping, WebContainer technology
WEAKNESSES: generated code quality varies, limited customization, not suitable for enterprise patterns
GE_RELEVANCE: competitor for simple applications, but GE targets custom enterprise-grade SaaS — different market segment
COMPETITIVE_POSITION: threat at the low end (simple apps), not competitive for GE's target complexity

LOVABLE

WHAT: AI app builder — generates full-stack applications with database, auth, and deployment
STRENGTHS: impressive full-stack generation, Supabase integration, fast time-to-prototype
WEAKNESSES: limited architectural control, generated code may not meet enterprise standards, Supabase dependency
GE_RELEVANCE: same as Bolt — competitor for simple applications, not for GE's enterprise-grade target
COMPETITIVE_POSITION: strong for prototypes and MVPs, not competitive for production enterprise SaaS


AGENTIC:GE_COMPETITIVE_ANALYSIS

WHAT_MAKES_GE_UNIQUE

DIFFERENTIATOR_1: SELF_LEARNING_BRAIN
- GE's wiki brain captures learnings from every agent session, synthesizes patterns, and injects them at boot
- no competitor has this — they all start from scratch or static configuration
- the brain compounds: every session makes every future session better
- implementation: session_summarizer.py (per-session) + knowledge_synthesizer.py (cross-session) + wiki (persistent)

DIFFERENTIATOR_2: 59_SPECIALIZED_AGENTS
- GE simulates a full human agency: account managers, designers, developers, testers, reviewers, deployers
- each agent has deep domain expertise encoded in tiered identity (CORE + ROLE + REFERENCE)
- competitors use 1 generalist agent or 3-5 loosely-defined agents
- specialization means higher quality per task — a testing agent knows testing patterns a generalist does not

DIFFERENTIATOR_3: ANTI_LLM_PIPELINE
- GE's quality pipeline is designed to catch LLM failure modes: Anna(spec) → Antje(TDD) → Devs → Koen(linting) → Marije(test) → Jasper(reconcile) → Marco(conflict) → Ashley(adversarial) → Jaap(SSOT) → Marta(merge)
- TDD-first means agents write tests before code — catches hallucination
- adversarial testing (Ashley) specifically targets LLM-generated code weaknesses
- SSOT reconciliation (Jaap) ensures final output matches specification
- no competitor has this level of systematic quality assurance

DIFFERENTIATOR_4: MULTI_PROVIDER_STRATEGY
- GE uses Claude, OpenAI, and Gemini — best model for each agent type
- no single-vendor dependency — if one provider has an outage or price increase, GE can shift load
- agents are provider-agnostic at the task level — same interface regardless of underlying model

DIFFERENTIATOR_5: HUMAN_GOVERNANCE
- constitution with 10 principles, discussion consensus for decisions, Dirk-Jan approval for major changes
- agents have autonomy within guardrails, not unchecked autonomy
- competitors either give full autonomy (risky) or require constant human oversight (slow)
- GE's model: trust but verify, with the brain learning from verification outcomes

DIFFERENTIATOR_6: ENTERPRISE_AT_SME_PRICE
- mission: make enterprise-grade custom SaaS accessible to SME business owners at intern price point
- hyperscalable architecture: built for 1 user, auto-scales to 100k without migration
- ISO 27001, SOC 2 Type II compliance — enterprise security at SME cost
- competitors either target enterprise (expensive) or SME (low quality) — GE does both

COMPETITOR_COMPARISON_MATRIX

Dimension GE Devin Factory Cursor Copilot Bolt/Lovable
Agent Count 59 1 ~5 1 1 1
Self-Learning yes (brain) no limited no no no
Quality Pipeline 10-stage basic some none none none
Multi-Provider yes (3) no (1) unknown yes (2) no (1) varies
Specialization deep generalist moderate generalist generalist generalist
Enterprise Ready yes (ISO/SOC) limited yes no yes no
Target Market SME enterprise enterprise developers developers individuals
Price Model per-project per-task enterprise subscription subscription freemium
Full Agency yes code only code focus code only code only app gen
Design Pipeline yes no no no no basic
Content Pipeline yes no no no no no
Media Pipeline yes (video/image) no no no no no

AGENTIC:TRENDS_TO_WATCH

TREND_1: AGENT_SPECIALIZATION
- market moving from "one agent does everything" to specialized agents for specific tasks
- validates GE's architecture — we were early to this pattern
- watch: how competitors implement specialization (will they reach 59 agents? unlikely without GE's brain)

TREND_2: TOOL_USE_STANDARDIZATION
- MCP, function calling, tool schemas becoming standardized
- reduces integration cost for new tools
- watch: MCP adoption rate, competing standards

TREND_3: MEMORY_AND_LEARNING
- every framework is adding memory/persistence — the market recognizes this matters
- GE's three-layer knowledge system (session → pattern → wiki) is ahead
- watch: LangGraph persistence, CrewAI memory, MemGPT/Letta approaches

TREND_4: EVALUATION_AND_BENCHMARKING
- SWE-bench, HumanEval, MBPP becoming standard — but they test individual completion, not multi-agent workflows
- no good benchmarks for multi-agent software development yet
- opportunity: GE could define the benchmark (if strategically valuable)

TREND_5: COST_OPTIMIZATION
- token costs are the bottleneck for agentic systems at scale
- GE's cost_gate.py and token burn prevention rules are ahead of the curve
- watch: model pricing trends, distillation techniques, local model viability for routine tasks

TREND_6: REGULATION
- EU AI Act implications for agentic systems
- transparency requirements align with GE's constitution (principle 10)
- watch: how regulation affects agentic development tools, compliance requirements for AI-generated code