I worked on this problem with https://www.agentkanban.io - this is a human in the loop integration for VS code that stores the context in the kanban task so it lives with the task (can also be forked, split etc).
This is rapidly becoming the "todo list demo" of the LLM era... I say this as someone who also built one because I was tired of all the others! (https://github.com/beaugunderson/obliscence)
First, OP's tool is cool, and worth trying. Yours as well, though when I see a vibed tool with no commits I wonder if the author is still using it themselves or achieved perfection. :-)
Meanwhile, the other todo list take, one I've undertaken as well, is to cross sync all the Claude Codes across all its instances on all your machines.
There are multiple projects that claim to do this. None do it fully. (They particularly have blind spots to tools that embed a Claude Code, such as the Xcode 26.5 and Xcode 27 beta.)
So: roll one's own, and in doing that, realize that it has first class tools to make back referencing transcripts normal.
Given those tools, you don't really need an extra layer.
totally, I see that syncing workflow come up quite a bit as well... some of this stuff is going to get sherlocked into the harness proper (probably a net benefit) though of course people are still going to want the meta-version that works with all of the different harnesses they use... so much of being effective with this style of development is molding it to your specific workflows and sanding off the rough edges with context, then sharing those wins with the larger team (another layer where I see a lot of differing approaches, like skillshare for example)
semantic search has been pretty good, it usually finds what it's looking for!
a couple of times I was certain that there was a session that contained some word but in reality it was in my personal claude.ai web account, so needed to add the import functionality there.
my favorite piece is the `corrections` command which surfaces all my frustrations/corrections in the last week for example... and I can then figure out if missing context would improve those scenarios going forward
Nice, yea I typically spend about 1/3 of my sessions on finding ways to improve the agents' SDLC. Lots of random audits and things.
And yea on the import thing, there are quite a few instances when session records can live on other machines, like cloud agents, dev boxes, etc.
Do you have any interest in sharing some transcripts with team members? I'm trying to figure out the shape of this solution because often times people I work with want to see what I did or fork one of my sessions, but I also don't necessarily just want unlimited dumping because I'm sure I have personal details in there too.
I often tell Claude Code to look at previous sessions in ~/.claude and it’s happy to jq/grep its way through them with no special tool. But being more efficient is always good.
At first I thought the main improvement would be that the search would be faster, but rg is already pretty freakin fast when the fs cache is warm.
What really ended up being the big efficiency improvement is the token efficiency. When you structure all of the transcripts in a SQL table, the agent can retrieve exactly what is needed (such as "print me the lite transcript, without the intermediate messages").
Claude has been heavily RL'd on using jq/grep, even if the tool is more efficient, Claude using it incorrectly or reading a book of examples in order to understand how to use it correctly is going to end up underperforming.
This is likely true, but ultimately this tool is just SQL, which I believe Claude and others must be heavily RL'd on. We try to not do anything "special" and make it boringly SQL representation of past sessions.
Very interesting project! I guess it could be even better if you didn't have to ingest the session data into a database but just build an index on top. I have an idea how to do it
We considered this, but the main thing you gain from this tradeoff is some disk space and cleaner retention semantics from not having to duplicate all of the searchable text.
But you still have to do the parsing and ingestion work to build the index in the first place, so CPU time does not go away.
And you still have to store the indexes and enough metadata to map results back to the raw session files, which bounds the benefit of not duplicating the data.
The main downside is flexibility (you would lose the ability to do arbitrary SQL queries, semantic search on top of structured corpus, etc)
But I would love to see if I can be proven wrong on this!
I would love a service that I could upload these chats to (anonymously) so that those developing open models can have it as training data and not just the closed model companies. My understanding is that it’s very valuable, look what cursor have managed to train. Obviously some filtering so that only chats or projects you want to share get shared would need to be in order.
I love this idea. I would totally upload my sessions if doing so helped train the next generation of open weight models. I'm using Claude to work on my free software projects so there aren't many secrets there anyway.
We have a private beta for a secure cloud version of the service, although its more geared towards teams/enterprise who want to share their work internally, rather than donating to open model developers. But interesting idea! I'm not very knowledgeable about crypto things, but I believe this is what people have considered "microtransactions" to be useful for.
> Coding agents usually start from zero. They can inspect the current repo, but they often cannot recover the discussions, decisions, failed attempts, commands, and test results from earlier work.
Sure they can. Just ask them. Some (like Claude Code) even have built in tools for it that work a treat. It'll happily rebuild an entire edit history diff by diff.
This is true, maybe we could reword it to be less absolute.
The bigger point is that when they do go spelunking in the old session logs, it is extremely token inefficient, and you can often fill up an entire context window and force a compaction just by trying to put together a transcript or summary.
The goal here is less of doing something previously impossible, but doing it in a way that makes it so efficient and cheap that you can have agents do it very often, like before they start on every single task.
Building this made it obvious that there should be a standard format / specification for agent transcripts and logs (similar to ACP for runtime events). If you're interested in discussing this, please reach out!
Of course, it's impossible to know for sure what was LLM processed or not, but some of your posts (like this one) have been getting classified that way.
my post are not AI generated...I apologize if my tone/vernacular comes off as generated, but that's just my own voice. down voting my comment based on unfounded assumptions is upsetting and discouraging to a new member such as myself.
Or you could just use a great agent/harness like Shelley that already uses sqlite to store converians is searchable from the UI as well.
https://github.com/boldsoftware/shelley
I worked on this problem with https://www.agentkanban.io - this is a human in the loop integration for VS code that stores the context in the kanban task so it lives with the task (can also be forked, split etc).
I like the CLI interface, very comprehensive.
However, I'm puzzled by pi support: https://github.com/ctxrs/ctx/issues/40
Really nice idea - there is certainly gold in the history of agent actions. But how do you keep the ctx from trusting stale history?
Thoughts on pedophile Rust advocates?
https://wng.org/articles/the-high-cost-of-negligence-1617309...
This is rapidly becoming the "todo list demo" of the LLM era... I say this as someone who also built one because I was tired of all the others! (https://github.com/beaugunderson/obliscence)
First, OP's tool is cool, and worth trying. Yours as well, though when I see a vibed tool with no commits I wonder if the author is still using it themselves or achieved perfection. :-)
Meanwhile, the other todo list take, one I've undertaken as well, is to cross sync all the Claude Codes across all its instances on all your machines.
There are multiple projects that claim to do this. None do it fully. (They particularly have blind spots to tools that embed a Claude Code, such as the Xcode 26.5 and Xcode 27 beta.)
So: roll one's own, and in doing that, realize that it has first class tools to make back referencing transcripts normal.
Given those tools, you don't really need an extra layer.
totally, I see that syncing workflow come up quite a bit as well... some of this stuff is going to get sherlocked into the harness proper (probably a net benefit) though of course people are still going to want the meta-version that works with all of the different harnesses they use... so much of being effective with this style of development is molding it to your specific workflows and sanding off the rough edges with context, then sharing those wins with the larger team (another layer where I see a lot of differing approaches, like skillshare for example)
Lol fair enough! Great project btw. Interesting choice to trigger incremental refresh on SessionStart hook, that's nice.
How have you enjoyed the semantic search?
semantic search has been pretty good, it usually finds what it's looking for!
a couple of times I was certain that there was a session that contained some word but in reality it was in my personal claude.ai web account, so needed to add the import functionality there.
my favorite piece is the `corrections` command which surfaces all my frustrations/corrections in the last week for example... and I can then figure out if missing context would improve those scenarios going forward
Nice, yea I typically spend about 1/3 of my sessions on finding ways to improve the agents' SDLC. Lots of random audits and things.
And yea on the import thing, there are quite a few instances when session records can live on other machines, like cloud agents, dev boxes, etc.
Do you have any interest in sharing some transcripts with team members? I'm trying to figure out the shape of this solution because often times people I work with want to see what I did or fork one of my sessions, but I also don't necessarily just want unlimited dumping because I'm sure I have personal details in there too.
sometimes i'll share prompts but but never a whole transcript (have not had a reason to)
if i do want to share context i'll use something like "give me a prompt $coworker can share with their claude to continue this work"
I often tell Claude Code to look at previous sessions in ~/.claude and it’s happy to jq/grep its way through them with no special tool. But being more efficient is always good.
Yes this is how we started as well!
At first I thought the main improvement would be that the search would be faster, but rg is already pretty freakin fast when the fs cache is warm.
What really ended up being the big efficiency improvement is the token efficiency. When you structure all of the transcripts in a SQL table, the agent can retrieve exactly what is needed (such as "print me the lite transcript, without the intermediate messages").
Claude has been heavily RL'd on using jq/grep, even if the tool is more efficient, Claude using it incorrectly or reading a book of examples in order to understand how to use it correctly is going to end up underperforming.
This is likely true, but ultimately this tool is just SQL, which I believe Claude and others must be heavily RL'd on. We try to not do anything "special" and make it boringly SQL representation of past sessions.
Very interesting project! I guess it could be even better if you didn't have to ingest the session data into a database but just build an index on top. I have an idea how to do it
Thanks and this is a very interesting idea!
We considered this, but the main thing you gain from this tradeoff is some disk space and cleaner retention semantics from not having to duplicate all of the searchable text.
But you still have to do the parsing and ingestion work to build the index in the first place, so CPU time does not go away.
And you still have to store the indexes and enough metadata to map results back to the raw session files, which bounds the benefit of not duplicating the data.
The main downside is flexibility (you would lose the ability to do arbitrary SQL queries, semantic search on top of structured corpus, etc)
But I would love to see if I can be proven wrong on this!
I would love a service that I could upload these chats to (anonymously) so that those developing open models can have it as training data and not just the closed model companies. My understanding is that it’s very valuable, look what cursor have managed to train. Obviously some filtering so that only chats or projects you want to share get shared would need to be in order.
I love this idea. I would totally upload my sessions if doing so helped train the next generation of open weight models. I'm using Claude to work on my free software projects so there aren't many secrets there anyway.
We have a private beta for a secure cloud version of the service, although its more geared towards teams/enterprise who want to share their work internally, rather than donating to open model developers. But interesting idea! I'm not very knowledgeable about crypto things, but I believe this is what people have considered "microtransactions" to be useful for.
The Chinese would love this.
> Coding agents usually start from zero. They can inspect the current repo, but they often cannot recover the discussions, decisions, failed attempts, commands, and test results from earlier work.
Sure they can. Just ask them. Some (like Claude Code) even have built in tools for it that work a treat. It'll happily rebuild an entire edit history diff by diff.
This is true, maybe we could reword it to be less absolute.
The bigger point is that when they do go spelunking in the old session logs, it is extremely token inefficient, and you can often fill up an entire context window and force a compaction just by trying to put together a transcript or summary.
The goal here is less of doing something previously impossible, but doing it in a way that makes it so efficient and cheap that you can have agents do it very often, like before they start on every single task.
I've been running https://github.com/kenn-io/agentsview for this, works well.
Building this made it obvious that there should be a standard format / specification for agent transcripts and logs (similar to ACP for runtime events). If you're interested in discussing this, please reach out!
[flagged]
Can you please not post AI-generated or AI-edited comments to HN? It's not allowed here - see https://news.ycombinator.com/newsguidelines.html#generated and https://news.ycombinator.com/item?id=47340079.
Of course, it's impossible to know for sure what was LLM processed or not, but some of your posts (like this one) have been getting classified that way.
my post are not AI generated...I apologize if my tone/vernacular comes off as generated, but that's just my own voice. down voting my comment based on unfounded assumptions is upsetting and discouraging to a new member such as myself.