Sure, it looks neat, but why would you ever want this? What happened to closing PRs like thise with a short and simple "This is unreadable. Split it into smaller self-contained commits, and write proper commit messages explaining what they do and why" comment?
Massive walls of code have always been rejected simply for being unreviewable. Why would you suddenly allow this for AI PRs - where you should be even more strict with your reviews?
I'm on the fence about this. Sometimes a new feature needs maybe 2k lines of code split over 10-20 or so files. Sure, you could split it up, but you can't necessarily run the parts in isolation and if they get split over multiple reviewers it might even be that no one reviewer gets the whole context.
So, I'm kind of okay with large PRs as long as they're one logical unit. Or, maybe rather, if it would be less painful to review as one PR rather than several.
I'm okay with long PRs, but please split them up into shorter commits. One big logical unit can nearly always be split into a few smaller units - tests, helpers, preliminary refactoring, handling the happy vs error path, etc. can be separated out from the rest of the code.
I think the particular problem is if AI is just producing large volumes of code which are unnecessary, because the LLM is simply not able to create a more concise solution. If this is the case it suggests these LLM generated solutions are likely bringing about a lot of tech debt faster than anyone is ever likely to be able to resolve it. Although maybe people are banking on LLMs themselves one day being sophisticated enough to do it, although that would also be the perfect time to price gouge them.
Agree. We've seen cowboy developers who move fast by producing unreadable code and cutting every corner. And sometimes that's ok. Say you want a proof of concept to validate demand and iterate on feedback. But we want maintainable and reliable production code we can reason about and grasp quickly. Tech debt has a price to pay and looks like LLM abusers are on a path to waking up with a heavy hangover :)
This is very much my take. As long as the general rule is a lack of long PRs, I think we get into a good place. Blueskying, scaffolding, all sorts of things reasonably end up in long PRs.
But, it becomes incumbent on the author to write a guide for reviewing the request, to call the reviewer's attention to areas of interest, perhaps even to outline decisions made.
Related to this, how do you get your comments that you add in the review back into your agent (Claude Code, Cursor, Codex etc.)? Everybody talks about AI doing the code review, but I want a solution for the inverse - I review AI code and it should then go away and fix all the comments, and then update the PR.
What you do is actually read the comments, think about how you can improve the code, and then improve it, whether by telling the agent to do that or doing it yourself
There’s a bunch of versions of this out there. This one’s mine, but it’s based on other ones. It works really well. It assesses the validity and importance of each comment, then handles it appropriately, creating issues, fixing the code, adding comments, updating the GH Copilot instructions file, etc.
I suspect you don't need anything special for this. The GH API has support for reading comments from PRs. Maybe have it maintain a small local store to remember the IDs of the comments it's already read so it doesn't try to re-implement already-implemented fixes. Another similar thing you can do is a hook that reminds it to start a subagent to monitor the CI/autofix errors after it creates/updates a PR.
GitHub API is actually quite tricky here because there is a different between “comment” and “review” and “review comment” (paraphrasing, I don’t remember the details). So it’s not as simple as one API call that grabs the markdown. Of course you can write a creative one-liner to extract what you need, though.
I tell claude code “review the comments on this PR” and give it the url, and that’s enough. It then uses the gh cli tool and fetches the PR and individual comments.
But why would you want to review long AI PRs in the first place? Why don't we apply the same standards we apply to humans? Doesn't matter if it was AI-generated, outsourced to Upwork freelancers or handcrafted in Notepad. Either submit well-structured, modular, readable, well-tested code or PR gets rejected.
the lines-viewed thing is cool for tracking progress but honestly the bigger problem I keep running into is that AI PRs look perfectly reasonable line by line. like the code reads fine, passes linting, tests pass - and then you realize it introduced a dependency you didn't need or there's a subtle auth bypass because the LLM pattern-matched from some tutorial that didn't handle edge cases. splitting into smaller commits helps but doesn't fully solve it because each commit also looks fine in isolation. I think we need better tooling around semantic review - not just did you read every line but did anyone actually verify the security properties didn't change. been spending a lot of time on this problem lately and tbh the existing static analysis tools weren't built for this pattern at all, they assume a human wrote something wrong not that an AI wrote something plausible but subtly broken
The issue is that the people submitting AI PRs are functionally code-illiterate, and are trying to con you into accepting their PR the same way the AI conned them.
If the people raising these PRs could understand the importance of introducing the dependency or subtle auth bypass in question, then they could address those details when handing the PR to you.
The problem is that these people are using AI because they *cannot* otherwise accomplish their responsibilities, and so they *do not understand* this sort of sleight of hand that is being used by the AI.
The people using AI in this manner are *desperate* to mask their illiteracy, and therefore are less concerned with system stability than you, the (presumably literate) reader of this comment. Their incentive is to survive among the literate (read: keep their job) for another day. If some devastating bugs are introduced in the process, well you need to crack a couple of eggs to make an omelette.
Makes same-origin requests to github's frontend to fetch info about line counts(line count figures are only sometimes loaded into app state) - that's the only network calls it makes.
Sure, it looks neat, but why would you ever want this? What happened to closing PRs like thise with a short and simple "This is unreadable. Split it into smaller self-contained commits, and write proper commit messages explaining what they do and why" comment?
Massive walls of code have always been rejected simply for being unreviewable. Why would you suddenly allow this for AI PRs - where you should be even more strict with your reviews?
I'm on the fence about this. Sometimes a new feature needs maybe 2k lines of code split over 10-20 or so files. Sure, you could split it up, but you can't necessarily run the parts in isolation and if they get split over multiple reviewers it might even be that no one reviewer gets the whole context.
So, I'm kind of okay with large PRs as long as they're one logical unit. Or, maybe rather, if it would be less painful to review as one PR rather than several.
I'm okay with long PRs, but please split them up into shorter commits. One big logical unit can nearly always be split into a few smaller units - tests, helpers, preliminary refactoring, handling the happy vs error path, etc. can be separated out from the rest of the code.
I think the particular problem is if AI is just producing large volumes of code which are unnecessary, because the LLM is simply not able to create a more concise solution. If this is the case it suggests these LLM generated solutions are likely bringing about a lot of tech debt faster than anyone is ever likely to be able to resolve it. Although maybe people are banking on LLMs themselves one day being sophisticated enough to do it, although that would also be the perfect time to price gouge them.
Agree. We've seen cowboy developers who move fast by producing unreadable code and cutting every corner. And sometimes that's ok. Say you want a proof of concept to validate demand and iterate on feedback. But we want maintainable and reliable production code we can reason about and grasp quickly. Tech debt has a price to pay and looks like LLM abusers are on a path to waking up with a heavy hangover :)
This is very much my take. As long as the general rule is a lack of long PRs, I think we get into a good place. Blueskying, scaffolding, all sorts of things reasonably end up in long PRs.
But, it becomes incumbent on the author to write a guide for reviewing the request, to call the reviewer's attention to areas of interest, perhaps even to outline decisions made.
I'm reviewing PRs I wrote myself. Valid concern in a real org though.
I don’t understand. Are they AI PRs (as in the title), or did you write them yourself?
Related to this, how do you get your comments that you add in the review back into your agent (Claude Code, Cursor, Codex etc.)? Everybody talks about AI doing the code review, but I want a solution for the inverse - I review AI code and it should then go away and fix all the comments, and then update the PR.
What you do is actually read the comments, think about how you can improve the code, and then improve it, whether by telling the agent to do that or doing it yourself
There’s a bunch of versions of this out there. This one’s mine, but it’s based on other ones. It works really well. It assesses the validity and importance of each comment, then handles it appropriately, creating issues, fixing the code, adding comments, updating the GH Copilot instructions file, etc.
https://github.com/cboone/cboone-cc-plugins/blob/main/plugin...
I suspect you don't need anything special for this. The GH API has support for reading comments from PRs. Maybe have it maintain a small local store to remember the IDs of the comments it's already read so it doesn't try to re-implement already-implemented fixes. Another similar thing you can do is a hook that reminds it to start a subagent to monitor the CI/autofix errors after it creates/updates a PR.
GitHub API is actually quite tricky here because there is a different between “comment” and “review” and “review comment” (paraphrasing, I don’t remember the details). So it’s not as simple as one API call that grabs the markdown. Of course you can write a creative one-liner to extract what you need, though.
I tell claude code “review the comments on this PR” and give it the url, and that’s enough. It then uses the gh cli tool and fetches the PR and individual comments.
I don't use it, but you can tag @copilot on GitHub comments and it will do so.
I don't do it because the chances of me reviewing vomited code are close to 0.
But why would you want to review long AI PRs in the first place? Why don't we apply the same standards we apply to humans? Doesn't matter if it was AI-generated, outsourced to Upwork freelancers or handcrafted in Notepad. Either submit well-structured, modular, readable, well-tested code or PR gets rejected.
the lines-viewed thing is cool for tracking progress but honestly the bigger problem I keep running into is that AI PRs look perfectly reasonable line by line. like the code reads fine, passes linting, tests pass - and then you realize it introduced a dependency you didn't need or there's a subtle auth bypass because the LLM pattern-matched from some tutorial that didn't handle edge cases. splitting into smaller commits helps but doesn't fully solve it because each commit also looks fine in isolation. I think we need better tooling around semantic review - not just did you read every line but did anyone actually verify the security properties didn't change. been spending a lot of time on this problem lately and tbh the existing static analysis tools weren't built for this pattern at all, they assume a human wrote something wrong not that an AI wrote something plausible but subtly broken
The issue is that the people submitting AI PRs are functionally code-illiterate, and are trying to con you into accepting their PR the same way the AI conned them.
If the people raising these PRs could understand the importance of introducing the dependency or subtle auth bypass in question, then they could address those details when handing the PR to you.
The problem is that these people are using AI because they *cannot* otherwise accomplish their responsibilities, and so they *do not understand* this sort of sleight of hand that is being used by the AI.
The people using AI in this manner are *desperate* to mask their illiteracy, and therefore are less concerned with system stability than you, the (presumably literate) reader of this comment. Their incentive is to survive among the literate (read: keep their job) for another day. If some devastating bugs are introduced in the process, well you need to crack a couple of eggs to make an omelette.
Can I get an AI that automatically nitpicks AI PRs with the goal of rejecting them?
What about the data security, is it sending code to any servers or it works on client side?
Makes same-origin requests to github's frontend to fetch info about line counts(line count figures are only sometimes loaded into app state) - that's the only network calls it makes.
Care to opensource? I'd like to use it in firefox, will send a pr
https://github.com/dfialkov/pr-lines-viewed
Was this vibe coded? Did you test it on itself?