Skip to content

2026-03-27 — End-to-End Readiness Roadmap (Verbatim from Dirk-Jan)

Context: Our goal now is to be able to test our new system end-to-end. We need to identify what's still needed before we can do a full cycle.

A couple of things I know that still need to happen (btw, save this verbatim so we can reference it time and time again during the next couple of sessions). In random order what we -at least- still need to do:

  • Train Dima and Aimee on client intake/communications
  • Dive into the release cycle: agents are done developing, how do we put this on a staging environment
  • How do we auto setup UpCloud+Bunny with the correct configuration
  • Isolate projects/clients and track token usage (in terms of hosting/traffic etc)

Next to that: we need to create a couple of scenarios that gradually increase the pressure on our platform. First dogfood test will be a simple dummy portfolio site or something.

Things we want to track/trace:

  • How does the intake go? (measured from several/all angles)
  • How does the refinement go? (we still need to develop this complete workflow — enabling Aimee to initiate contact with the customer and talk to him/her to gather necessary info) we also have to evaluate if she's asking the right questions, does she bore the client with 300 questions or does she only ask for 2 while 6 were more appropriate
  • We need to evaluate the handover between Dima and Aimee and between Aimee and Alexander/Faye/Sytske
  • We need to evaluate if and how clients get assigned to teams
  • We need to load Alexander with the needed tools and do a few dry runs to see how he goes from design instructions to actual designs
  • We need to track & trace how agents make use of the discussion protocol, how they capture learnings and see whether we agree with their learnings or not (missed a few or created duplicate learnings or loaded irrelevant info into the wiki)
  • We need to see if voting works as intended and check whether the votes actually make sense and are implemented throughout the team
  • We have to evaluate the development team vs test team workflows. Do they actually kill the LLM drift/standard issues. Or not.
  • Then we need to evaluate (or completely create) a flow for post development (the lane structure for GitLab CI/CD testing is non existent in this new release I think) need to set that up before we do the first dry run. The old system had Lane A - E. Which actually produced valuable intel for the development team but it never ran autonomously, always via our session monitoring the output.

And if we have all that fixed we still need to implement token tracking, invoicing, customer portal etc etc.

But let's first focus on a simple portfolio demo, then up the game with a mobile app, then make a little more complex system, maybe a blog or a simple webshop. Just for dummy purposes, building up learnings and analyzing the system. Eventually we will want to develop our first commercial project: peppolmadeeasy.eu — this project already lives on fort-knox-dev somewhere but we will completely redo it.


Progressive Test Scenarios

  1. Portfolio site (simple, static-ish) — first dry run
  2. Mobile app — increases complexity
  3. Blog or simple webshop — more moving parts
  4. peppolmadeeasy.eu — first real commercial project (full redo)