ax/interview.md at main · Necmttn/ax

So the idea is if you go back to, like, a beginning of the old... these AI agents, etcetera, how it started was to predicting your next edit in the coding. Like, you will open a code editor ID, and you will, like, start typing something. Our agent was predicting what you're gonna write it next. And your reaction to prediction, like a pressing tab, is accepting the prediction or keep typing, changing it to correct behavior was the feedback mechanism that made it this, like, autocomplete AI agents to be very, very good at what they are doing because they were... had a beautiful harness to get a feedback loop to iterate over them. The next level we got was... now the agent was able to do more than just writing a couple of lines. You predict... like, you describe what you need, it goes and plans, and eventually, like, it goes and execute it. And once you see the result of it, you kinda, like, give a feedback in a natural language form, say, like, oh, this is bad. This is good. And the test cases, etcetera, this has been the case so far that we do with some sort of, like, orchestrating, is conducting a a agent to do its works by giving feedbacks each time it's means that, like, it's... I mean, it's time. Each time it finishes something, you give more feedback. It's... learns from you and it trades through. So this is all good and fine, but if you see the pattern, it slowly changes and increases the scope of it. The next one we have these days is, like, recently, six months ago, there was a ralph loop. It makes it like you make a plan, it breaks it on a task, it goes, iterates, passes some checks, then if it passes, it continues. From all these experiences, what they are going next is that, like, you just give the idea, agent goes and explore, but we slowly losing the human in the lobe that gives the harness a direction, like you give it what goes well, what goes bad. So, what we are missing is some sort of retro for agents. What if there was a way, if you are removing the human factor out of the loop of generating codes, generating products, how we can make AI to actually capture what we are talking about. It's like uh what if we have retro for each session agent runs. After it finishes the work, it's given, it's gives like uh we ask agent to, can you do a quick interview? How was your session? What was worked? What failed? Where did you struggle? It doesn't have to act like a solver problem, it just shares what went wrong. And if there's a good signal, it shares. If there is a bad signals, it shares. So that's the one. Interviewing itself or user? Interviewing the itself, like how it worked. That's why it's called like a retro. So that could be like a really good way of collecting information from agent itself to self-improve. So in more session it goes through these retros, it fixes, patches some of the problems it faces to be in a way that like a good. But the problems of this, if you just run these self-improvement loops, they go loose and rogue. And if you don't have any way to attract them, you don't know what you're attracting. You don't have any backing data, changing X, Y, Z that it actually helps you, or did not because you don't have any back-end data? So the platform I build, X, is solves that. What it does, it ingests all your AI agents workflows, like transcriptions of their conversations, the tools it uses, the skills it triggers, and from those it goes creates like a relational graph database to like find the patterns. What skill is causes, what type of changes, is it successful, it fails. On top of skills, it's also checks for the sub-agents, hooks, all these little rules and stuff. And so it has a visibility of what's going on, and the second layer on top of it is the experiment. So each time you can run your agents like a setup experiment, this is kind of hands-free. You just ask for once a week your agent to do retro, figure out all the feedbacks from your one-week worth of AI agent sessions to figure out what can be improved. The improvements can be like, oh, you spent too much time on type checking with this library, but there is an Oxyros which is very faster linting. You can save like a one hour of compile time just changing the XYZ. So it's like um gives auto suggestions from the workflows it's running through. So this requirement. Within the system, we are not only ingesting the agents' feedbacks, but we are also ingesting every conversation. So that as a human, you are reviewing what the agent done after like a three hours hence very working, and you say, like, oh, this session was sucked, you built shit. That's also feedback. You are giving to the agent to like, your corrections are the feedback. They are higher level, but they are the feedbacks. What else is out there is like, for example, if type of work, you ask for AI, made it to the main branch of a Git repository, it's a signal as well. It gives the whatever you've done in this session is actually worked out, and you made it to the production. And if there is a no follow-up PR fixing bug related to that, that session was relatively successful session. That's also some signals that we are collecting. I feel like for other workflows it might be more challenging, for example if you are like, I don't know, making a marketing strategy and you're posting videos, then you need some analytics on those videos, ingested sort of Yeah. For starter we are right now focusing on coding because it's the most pragmatic thing that you can test and verify because the results are immediate if the tests are passing, the production, like a user is accepting, this was a good work. But I believe this can expand into the multiple types of agents. But we start because this doesn't exist, and the easiest one to go have a start is the coding agents because we know very well how it works. And we are already predicting that the coding as a job is going to disappear. The reason why, because it's very much everything is out there ready for us to like uh build a harness around it. Do you have a working prototype that you tested on that thing? Yeah? Yeah. So you test it on your own work? Did it actually give you any good advice? So this is the very first version of Axe agent experience. It is not but in a way that it can actually you can actually see interesting bits of data that what type of skills are executing, how long it's taking, how we can improve the timing, etc. So, majority of the discovery is actually happens through the agents, looking at your data. It's not the human who's driving the thing. So when you ask for AI to go figure out why our X, Y, Z is wrong, it's able to actually says, oh, this is went wrong because we didn't figure out the port was running, etcetera. Those are the small things if you identify correctly, you can cut the noise, get it to the agent immediately. How big is the memory of the CI agent gonna be? And is it gonna be cross-project or per-project? For example, if you are building several projects at the time and in one project you were like find me like a good linting library it did not find and you suggested I'll use aux lint instead will this knowledge transfer to another project? We are ingesting all your sessions from clouds, code, and the codecs. So it's cross-project, but locally on your computer. So everything in your lab. Is there a database solution similar to use context in a way? It is very similar, but the argument in question is this time is actually the coding files, your GitHub messag uh GitHub uh Git messages, your PR reviews is also joins into it. Kinda clear to me more or less what is going on? Yeah. Kinda interesting. So yeah, this is the what we are building.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

interview.md

Latest commit

History

interview.md

File metadata and controls