Skip to content

Session Transcript — 2026-03-20 — Agent Commissioning Phase 1

STATUS: verbatim human input CONTEXT: Phase 1 role alignment sprint with Claude Opus 4.6 PURPOSE: mission-critical reference for future pipeline/agent/workflow refinement


Human Input — Word for Word

On resuming / book extraction

yes, and lets leave the book extraction, i think we're not there yet. to hard. let's just focus on publicly available information.

On Dolly

1

On Hilrieke — commissioning engine

mostly, Hilrieke should be the one generating the actual content for future agents. So right now (for setup) we're doing it together. Capturing the learnings (like book extraction is a bridge too far for now) and maybe other stuff that needs human oversight. But the goal is that the process we have defined now is something that will be part of hilrieke's work. In the future I will initatie a chat via de the ge-admin with Hilrieke and ask her: hey Hil, I would like to add a new team member, someone who will focus on financial management within the company, a CFO if you will. And then I expect her to go through the cycle with me to do the research and create the content for the agent profile. Catch my drift?

On Hilrieke — agent vitality + marketplace

Yes, and for the future I would like to do the actual HR cycle as well, so asking agents whether they're happy or not and introduce markers for the agents to count happyness and other work related stuff. I think agent vitality will become a thing in the future. Also, I want to build a we're hiring page on growing-europe where agents can apply for jobs. That way we can harvest the power of the internet. If somebody in cambodja created the perfect agent for task X then we could decide to hire that agent instead of building it ourselves. This would further boost our scalability. So Hilrieke should be prepared for those two tasks as well.

On Victoria — open to discussion

does victoria her role make sense from your point of view? Im open to discussion here. I just went with my gut, working from a security first, tdd, ddd approach but if you have better ideas or extra input please speak up.

On Victoria — redefine

redefine, most definitly, any other improvements you suggest from an iso-27001, star, soc2 type2 pov that you would address now in regard to the security/compliance/audit chain?

On ISO/SOC improvements — Amber, risk registers, change management, vendor management

go on Amber suggestion, I agree this would lead to clean separation. including the continuous evidence collection. also agree on register ownership for Julian and Victoria. Also agree on change management governance, please add it to their current identities as a lets not forget reminder when i work on them. Same for business continuity for otto. Vendor management is I think the only one where I would always require human in the loop by default.

On Piotr — third party secrets

aligned but I would not rule out that piotr gets triggered by other agents, for instance if urszula is integrating a payment environment for a client project we should receive the api key for that clients payment api. Such a key is something we should save but will most likely be provided by either Aimée during refinement, by Faye as the projectmanager or by ??? we should note that we need a secure way to ingest 3rd party secrets that we need for development. Any ideas?

On secure secrets intake — confirm

yes please

On Hugo — identity vs IAM

Should A be a separate role or part of (urszula's) backend work? I think we initially saw some situations where the testing agent started executing the repairs themselves. We wanted to prevent that. They need to write test reports and assign results back to development. We thought of Hugo to enforce role boundary (no clue as to how this will actually work in practice. Hugo would have to be reading on all realtime PTY and be able to analyze directly?) So it's an interesting question. Should we separate identity/IAM integrations into a specific agent or should it be part of backend tasks? And how should we monitor for role obidience.

On role enforcement — executor restrictions + PTY discussion

yes on all, we need the pty analysis for the learning cycle so we should address this as a separate discussion that we need to have. Evaluate what we have now, how token consumption heavy it is, and see if we need to rethink that.

On Koen/Eric — the anti-LLM quality chain

so this is where it gets interesting. Please have a look at antje, anna, ashley, marije and our ssot agent. Initially Koen/Eric where code reviewers but that, when we did v1.0 of the admin-ui, left us with a completely -excuse my french- fucked up scaffolding situation where nothing was working. I then challenged you by saying: 6 hours of development work and this is what you produce, are you kidding me? Then we started to brainstorm about how we could prevent scaffolding, functions that are never called and some other integrity issues that you run into a lot from happening. I suggested a TDD approach combined with a security first and anti-llm pattern to enforce that no function could exist without having a e2e test completed. This was a very productive discussion which led to the commissioning of the above named agents and some other ones if I recall correctly. So please look into this and let me know your thoughts.

On quality chain — honest answer + Koen/Eric redundancy

my question back to you: is this the chain that you would suggest to have a 100% guarentee of tackling the structural LLM problems in development? HONEST ANSWER! in regard to 2. Koen/Eric can be decommissioned if they serve no purpose anymore, names can be re-added to the available name registry, profiles decommissioned (by Hilrieke) and then they can be put to work later on if we need new agents. Who does code integrity checks, linter etc ?

On quality chain — agreement

I think I agree with this setup.

On stack policy — multi-stack discussion

You raise critical questions, lets talk through them. Lets assume that all development starts with backend, which means -in team alpha means urszula- we need to weigh the following: if we let the client deliver input on stack choice, we induce possible quality problems. Say the client insist on a dev language thats not strict, thats not a perse blocker for a human development team but it is for a llm development team. The stricter, the better documented, the better the output. Thats one argument we need to challenge, the second one is: if we allow all stacks we get a huge wiki brain for development (both backend and frontend, whats the 'p&l' for that in terms of scalability, maintainability, quality etc of entire GE. The third argument is: what if the clients wish for stack is originating from a underskilled source (eg. a friend who advises the client sometimes, but this friend is 199 years old, only worked with html1.0) what happens then? shouldn't we steer/decide whats best? Should we maybe give a couple of options? or should we not introduce such a barrier and just go with what we determine most suitable based on project scope. This discussion not only touches backend but also database management, frontend, identity management and nearly all other agents so please keep that in mind in your analysis/input.

On stack policy — agreement + remote teams

no this looks like a good approach. Especially for new/isolated projects. But if we want to deploy a remote team to work on a client's codebase we should do an investigation first, but that is for later worry. Right now this looks like the best approach.

On discussions and escalation

more or less, do we need to say something about the discussion and escalate to human functions in these profiles? whats your take on how that should work and how should agents know they have these features available?

On discussion learnings / precedent system

yes do that please, and also take into account that we somehow need to capture learnings from the discussions as well. If a discussion whether we can safely upgrade a database version leads to yes for this project then for future projects the challenge should be: check for previous discussions, same config: skip discussion, assume its the same outcome, other config, start discussion, reference old discussion to speed up consensus agreement.

On Floris — the full client delivery pipeline

so for Floris, this is partially correct. All developers get their workpackages distributed via Faye, the projectmanager. The inlet works as follows. Dima does the intake with the client, basicly a questionairre filled out via a 1on1 conversation/chat between dima and the client. Dima writes a comprehensive report and hands it over to aimee. where dima's output contains something like 'there should be a user environment' aimee is the scope refinement agent, she drills down on all functionality mentioned in dima's handover and refines them. So if she reads something like user environment she uses the wiki brain + llm knowledge to write a proper functional description of that function. User environment means: login, logout, password reset, 2 factor auth, verify emailaddress, store personal data, password manager etc etc. She goes from conversation to full functional spec. She then asks other agents to contribute, you can think of Julian for a preliminary compliance check but maybe also the test-writers, when that input is delivered and the scope is fully refined the package is handed over to Alexander for UI/UX design (ALexander will also be a complex one for you and me to refine, design driven by llm is a difficult one to tackle ive already learned). Alexander then hands over the entire package to Faye, Faye does a comprehensive double check if everything is there for the team to start. She then disects the complete delivery into workpackages. This can be 5 or 10 workpackages but it can also be 100 or more depending on project size. She determines what can be done in paralel without the risk of working in the same file and what should be sequential. She then triggers the team to start work. Dolly facilitates the trigger logic to make sure an agent is only triggered when the previous (no paralel) workpackages are finished. Aimee and Dima are both able to talk to the client/initiate contact. If Aimée encounters something like user environment and she doesnt know whether the client has his own IAM environment or not, but she suspects it due to the rest of the briefing or the size of the clients company, she can autonomously trigger a conversation with the client. She will send an email 'Hey client, let me introduce myself, my name is Aimee, I'm currently refining your project and I have $number of questions for you. Please pick a moment that aligns with your schedule to come talk to me. I think our conversation will take approximately 10 minutes of your time. Click the link below whenever you're ready. The same goes for Dima but more from a non technical refinement pov. If both dima, faye and aimee have missed things then hopefully urszula or floris encounters them when in development. They can then ask Aimee/Faye for clarification via discussion and if it turns out that the client's input is needed to proceed, aimee reconnects with the client to flush out the details. So Floris is like Urszula a teamlead in development, Urszula for backend, Floris for frontend. Your thoughts/suggestions?

On Floris — confirmation + Margot + Aimee Opus + client data

no thats exactly what Margot does, she handles invoices, aftersales, newsletters, press releases etc. Faye does the project related communication. on the teamleads, yes, should be in their identity. yes on alexander receiving both. Aimee should be Opus I think. And all this data needs to be saved in the clients section within the brain, we need to be able to understand what the input was, what output was delivered pre production and be able to compare that to the release notes and be able to read the history when planning or building a next version of the application.

On Marije — alignment

yes on independent test writing and yes on alignment

On Alexander — priority

maybe we should dive into alexander first, he's a big missing link that we might benefit from doing him during this session since you have a lot of context now

On Alexander — the LLM design problem

well thank you for pointing out the problem, this is exactly the challenge I was talking about. LLM can't design. But we're not here to look at what we can't, we're here to the big ones in agentic engineering. So my take would be: yes, everything you describe in your last response. But let's think on how we could solve this problem and be openminded about the options. Some input from my perspective: We could do everything above, invite a human designer to be part of our pipeline, human design agency (could be multiple) receives an email: GE would like to consult you to create a design framework for project X, hourly rate 100 eur ex VAT. click here to accept. Project must be delivered within XX days. Bonus of XXX euro if you deliver within 48hrs. (because this route would actively mean that we delay each project by 200-500% due to human interaction need, we should have incentive to minimize this). But another option would be for Alexander to search for opensource (or even paid) existing design frameworks/templates from (opensource) marketplaces based upon his own made design elements. Try to find a match via the description or color/styling setup of that template. Or Alexander does a pre selection from which a human can choose, this can either be me or the client? We should also consider Google's new tool (forgot the name) or see what Figma can do for us here. Please do some research and give me your thoughts.

On Alexander — Stitch integration + template marketplace

yes this works and is exactly what I was hoping on, outcome wise. However I think we should not write off a free/paid template marketplace. But that can be discussed later. Stitch is the must have, standard templates are nice to have. Can you create a .md file where we handle the installation of the stitch mcp server within GE architecture? as a third party software, in the admin UI we have a page where we 3rd party integrations listen (open ai key, gemini key, transip (domain management), upcloud (hosting) are configured there. We should also list stitch there (if it's different from standard gemini key/use).

On commercialization layer — FULL VERBATIM

lets note something: what we are working on now is the basic pipeline, we need this to develop a project but in the near future we will be working on a commercialization layer. This will introduce a couple of new agents who temporarily insert themselves into our pipeline and create stop go moments. Let me explain. On the public website we will have a Dima chatbot, visitors can instantly start a project. However, we're not in the business of unneccesarily burning down the amazone therefore visitors need to pay 25/hour for talking to dima, this helps us filter out part of the bullshit. So when people first start talking to Dima they have to agree to a retainer of 25eur/hr, insert creditcard information, make payment and then they get access to actually talk to dima. As soon as the briefing is done and the client verifies the outlines of the briefing, another agent (let's recommission Eric for this) steps in. He engages with the client to get a contract signed. We'll keep the contract real simple but it must cover a couple of things, like: we don't build projects which conflict with the law, we don't build projects focussed on drugs, sex, alcohol, guns etc etc. We need to filter that out. Agreeing to the contract should make payment irreverisble and Eric handles KYC process also, scan passport/identity card, do liveness check and if client is company: verify extract chamber of commerce to make sure the individual listed in the passport has the authority to sign contracts on behalf of the company. I've once built a full company + legal representative workflow in lovable, worked suprisingly well (stripe) but now we're buileding that complete flow ourselves. Eric will be our contract + kyc manager. Clients cannot get to Aimée if no contract is signed by a legal represenatative of the company and credits are bought. After the contract is signed Aimee goes to work, when she's done we need to calculate (we need to think about this) a price range for the project. We need to be realistic towards the clients, a Enterprise SaaS solution in the human world would cost 150k euro, let's say we offer it for 10% of that price and we actually deliver. We should still make sure we go X5 from complete tokencost of entire project to the 10% of human equivalent world. This calculation + contract negotiation with the client is also a stop/go moment in the pipeline. I think another agent should handle that but my mind is not completely made up on this. By the way, could you make sure all my input during our session today is saved somewhere, no summary or small pieces but word for word. This is very important information to fall back to when we will be further refining the agents/pipeline/workflow. This mission critical information needs to survive multi claude sessions


Session Metadata

DATE: 2026-03-20 AGENTS_DONE: 16/56 (2 decommissioned, 14 aligned) KEY_DECISIONS: stack policy, anti-LLM pipeline, delivery pipeline, Alexander/Stitch integration, commercialization layer outlined NEXT: #16 Marta (GitHub Goalkeeper)