Re: Introduce Sashiko (agentic review of Linux kernel changes)

From: Lorenzo Stoakes (Oracle)

Date: Fri Apr 03 2026 - 03:48:33 EST

On Thu, Apr 02, 2026 at 06:48:18PM -0700, Roman Gushchin wrote:
> Sean Christopherson <seanjc@xxxxxxxxxx> writes:
>
> > +Venkatesh and Paolo
> >
> > On Thu, Mar 19, 2026, Roman Gushchin wrote:
> >> "Lorenzo Stoakes (Oracle)" <ljs@xxxxxxxxxx> writes:
> >> > On Wed, Mar 18, 2026 at 11:33:22AM -0700, Roman Gushchin wrote:
> >> >> >> Finally, some subsystems have a good prompts coverage and some don't. It
> >> >> >> doesn't have to be lengthy documentation (and it might actually be
> >> >> >> counter-productive), but having a small list of things to look at - some
> >> >> >> high-level concepts which are hard to grasp from the code, etc. - can
> >> >> >> help a lot with both bug discovery and false positives.
> >> >> >
> >> >> > I guess best contributed to Chris's review-prompts repo right?
> >> >>
> >> >> Both works for me now, we'll figure out with Chris how to sync our
> >> >> prompts. The small problem is that we're using various models, tools and
> >> >> review protocols and barely can test each other's setup. And it's all
> >> >> very fragile, so it's not exactly trivial.
> >> >> But we'll figure out something soon.
> >> > Yeah, part of the fun I guess :)
> >> >
> >> >> In general we need to carefully separate instructions (like which tools
> >> >> to use, which prompts to load etc) from factual data. Then we can easily
> >> >> use the factual data with various tooling around.
> >
> > In an offline conversation, Venkatesh had a very (IMO) insightful observation
> > regarding the factual data of the prompts: the information is also very useful
> > documentation for *humans*. And in response to me lamenting about having to
> > potentially review an external repo, Venkatesh also suggested putting the gory
> > details about subsystem behavior in the kernel's Documentation/.
> >
> > To me, that suggestion seems like a no brainer. The existing subject matter
> > experts are already in place to review and help maintain the documentation, the
> > documentation can be updated in lockstep with the code, those of us that like
> > email-based review don't need to change our ways, etc. :-)
> >
> > And irrespective of AI domination, I'd love to have detailed documenation of some
> > of KVM's gnarlier internals. If AI review is what gets us the staffing/motivation
> > to write and maintain that documentation, then so be it. It would be a shame if
> > some of the most comprehensive documentation for the kernel is buried in AI
> > specific prompts.
> >
> > Naively, synchronizing from Documentation to model-specific bots doesn't seem
> > like it'd be a hard problem to solve.
>
> I think so too, thanks Sean!
>
> First, I agree improving the documentation is a no-brainer with or
> without AI. And AI will benefit from it too.
>
> The only part which is slightly less obvious is what should go into
> prompts vs what the model gets through training. I'm pretty sure that
> the Linux kernel source code is used during the training, so at least
> big frontier models "know" a lot about the kernel code already, so
> prompts should really only close the gap between the cut date and new
> developments plus all kinds of tribal knowledge which is not easy to
> derive from the code or documentation (e.g. the outcome of some
> hallway conversation during the last conference).

Well :) we constantly struggle with documentation as the catch-22 is that the
people who know enough to document are all too busy to write it. We have tried
for years to encourage but it is what it is.

And we do kinda require the documentation to be up to a certain standard and
applicable to $CURRENT_KERNEL.

[Side note: My book side-stepped this by a. being tied to a frozen kernel
version and b. I could skip any area I deemed out of scope (though it differs in
approach too - isn't documentation so much as a code commentary with lots of
explanation).]

I mean I've made my own efforts on this to some degree,
e.g. https://docs.kernel.org/mm/process_addrs.html so have tossed my hat into
the ring, but how consumable that'd be I'm not sure :) and presumably the LLMs
gobble all that up anyway.

>
> But I'm hopeful we can figure out a way to auto-generate prompts from
> the documentation by stripping part which are "obvious" to the model.

Yeah I also wonder to what degree finessing things for LLM consumption matters
too, I have no expereince with that but I hear that there's some of that going
on in Chris's prompts?

I did imagine mm-specific stuff for the LLM could be in the form of
_generalised_ concepts and imperative instructions instructions.

>
> Thanks!

Anyway semi-rambling here so TL;DR:

It'd probably be quicker to move on LLM-specific prompts -
they'll have less scrutiny than generalised documentation, less effort required
to format, and more able to finesse for LLM purposes.

OTOH documentation would be more broadly useful for humans too, but I just
wonder if anybody will have the time to do that :>)

Cheers, Lorenzo