There is no shortage of strong coding models right now.
Codex is good. Claude is good. In many cases, the model itself is no longer the main bottleneck.
What still feels unfinished, at least to me, is the layer around the model.
That is why I started building Kodama.
Kodama is not a new coding agent in the usual sense. It does not try to compete with Codex or Claude, and it definitely does not pretend to be smarter than the tools it wraps.
What it does instead is more practical: it gives me a structured way to go from a product requirement or PRD to a backlog of tasks, assign those tasks to different agents, and choose different execution profiles depending on the kind of work that needs to be done.
That was the gap I actually cared about.
The Problem I Wanted To Solve
Most agent tooling still starts from the same basic interaction model:
- open a chat
- describe the problem
- keep going until something useful happens
That can work surprisingly well for focused tasks.
But once the work gets larger, I keep running into the same friction:
- a PRD is not executable work
- one conversation is a poor substitute for a backlog
- not all tasks should go to the same agent
- not all tasks should be approached with the same working style
That last point matters more than many tools seem to assume.
In real software work, "implement this feature", "review this change", "design the architecture", and "investigate an incident" are not the same kind of task. At least in my experience, they should not necessarily be given to the same agent in the same voice with the same expectations.
That is where the default chat-first model started to feel thin to me.
Why A PRD Is Only The Beginning
One of the things I wanted from Kodama was a cleaner path from requirements to execution.
A PRD can be useful, but it is not a plan.
Someone or something still needs to answer questions like:
- what are the actual implementation tasks
- in what order should they happen
- which ones are architectural
- which ones are straightforward coding work
- which ones are review or QA tasks
That transformation step is easy to underestimate.
At least for me, it matters a lot, because this is where intent turns into manageable work.
Kodama has an explicit PRD task planning flow for exactly that reason. The idea is not just to "chat about the PRD", but to derive a backlog from it and import that backlog into a form the system can actually manage.
That already feels more useful to me than burying requirements in a long conversation and hoping the right next step will somehow emerge from context alone.
Why Backlog Matters More Than Chat History
I do not think chat history is a good long-term substitute for explicit task structure.
Conversations are useful for exploration, but they are poor at representing operational state.
For actual project work, I usually care about things like:
- what is pending
- what is in progress
- what is blocked
- what depends on what
- what belongs to which project
That is backlog territory, not chat territory.
This may sound almost boring compared to the usual agent hype, but I think this is exactly the kind of boring that matters. Once work spans multiple threads, a visible backlog becomes much more valuable than one more clever prompt.
That is one of the core ideas in Kodama: take the work out of the chat stream and turn it into something explicit.
Why Task Assignment Matters
The second thing I wanted was per-task agent selection.
I do not assume one agent is always the right choice for every kind of work.
Sometimes I may prefer Codex. Sometimes I may prefer Claude. Sometimes I may want a failover path if one hits a limit.
That is not just about model quality in the abstract. It is about workflow fit.
Different tools have different strengths, different failure modes, and sometimes simply different practical behavior on a given task. I wanted a system where agent selection is part of the task definition, not an afterthought hidden in whichever tab I happen to have open.
To me, that makes the whole setup more deliberate.
Why Profiles And Personalities Matter
I also did not want every task to be approached with the same generic "helpful coding assistant" behavior.
Kodama supports task profiles such as:
- architect
- developer
- QA
- refactorer
- incident responder
- UX reviewer
That may sound like a small detail, but I think it changes the quality of the work more than many people expect.
An architectural task should push toward trade-offs, interfaces, and migration thinking.
A QA-oriented task should behave more like a reviewer looking for defects and edge cases.
A refactoring task should optimize for structural improvement without casually changing behavior.
An incident-style task should bias toward safe mitigation and reduced blast radius.
Those are different working modes. In my experience, it helps to make them explicit.
I would not claim this solves everything, but it does make the interaction more intentional. Instead of asking one agent to somehow infer the right mindset every time, Kodama lets me attach the desired posture to the task itself.
What Kodama Actually Wraps
One thing I want to be careful about is not overstating what Kodama is.
It is not a foundation model. It is not a magical autonomous engineer. It is not trying to replace Codex or Claude.
It is a wrapper and orchestration layer around them.
More specifically, it is a self-hosted workspace that gives me:
- multi-project organization
- backlog-driven task execution
- per-task agent selection
- per-task execution profiles
- async execution through a web UI
- Telegram-based notifications and replies for agent questions
- structured protocol handling around long-running work
That is the level I was interested in building.
I was much less interested in inventing "yet another agent" and much more interested in building a better operational layer around agents that already exist.
Why Telegram Matters More Than It Sounds
One part that turned out to matter a lot in practice is the Telegram integration.
Longer-running agent work is rarely fully autonomous. At some point, an agent may need clarification, a decision, or a missing piece of information before it can continue.
Without a good feedback path, that creates a very familiar problem: the work is technically "running", but in reality it is waiting for me to come back to my desk, notice the question, and respond in the UI.
That is exactly the kind of friction I wanted to reduce.
With Telegram in the loop, I do not have to sit in front of the browser all the time. If an agent has a question, I can get notified, answer from my phone, and let the task continue. I can also check status without having to be on my laptop.
That may sound like a convenience feature, but to me it changes the usability of the whole system. It makes async work feel actually async instead of "async as long as I stay near the same machine".
I would not claim everyone needs that, but for my own workflow it makes a noticeable difference.
Why This Feels More Useful To Me
At least for my own work, the interesting question is no longer:
"Can an LLM generate code?"
The more interesting question is:
"How do I make agent-based development feel closer to real project execution and less like a sequence of loosely connected chats?"
For me, the answer increasingly looks like this:
- start from requirements
- turn them into explicit tasks
- assign those tasks intentionally
- pick the right agent for the right work
- choose the right execution profile
- keep the state visible
That is the role I want Kodama to play.
I do not see that as a replacement for direct model usage. I see it as a better way to organize work around models that are already useful.
Maybe there are other ways to solve that problem. I would actually expect there to be. This is just the direction that made the most sense to me once I looked at where the real friction was.
What I Am Not Claiming
I am not claiming everyone needs a wrapper like this.
For many people, a direct Codex or Claude workflow may be exactly right.
I am also not claiming backlog-driven orchestration is the future of all AI development workflows.
What I am saying is simpler: for my own work, once requirements, multiple projects, and different task types are involved, I want more structure than a pure chat interface usually gives me.
That is the context in which Kodama makes sense to me.
Closing Thoughts
Kodama exists because I wanted a more practical layer between a PRD and actual agent execution.
Not a new model. Not another general-purpose chat app. And not a vague promise of "AI that does everything".
What I wanted was something more operational:
- turn requirements into backlog
- route work to different agents
- attach the right working style to the task
- keep execution visible and manageable
That is what I currently find interesting in this space.
Not just better models, but better systems around the models.
If that problem resonates with you, Kodama will probably make sense immediately.
If not, that is fine too. It is just one attempt to make agent-driven development feel a bit more grounded in real engineering work.
PLEASE NOTE: Kodama is currently a self-hosted, single-user tool intended for trusted networks or an authenticated edge
Link to GitHub (your mileage may vary): Kodama Github