Re: Introduce Sashiko (agentic review of Linux kernel changes)

From: Lorenzo Stoakes (Oracle)

Date: Wed Mar 18 2026 - 08:03:37 EST

On Tue, Mar 17, 2026 at 03:31:11PM +0000, Roman Gushchin wrote:
> Hello,
>
> I'm happy to share something my colleagues and I have been working on
> for the last several months:
> Sashiko - an agentic system for Linux kernel changes.
>
> First, Sashiko is available as a service at:
> * https://sashiko.dev
>
> It reviews all patches sent to LKML and several other Linux kernel
> mailing lists using the Gemini 3.1 Pro model.
>
> I want to thank my employer, Google, for providing the ML compute
> resources and infrastructure for making this project real.
>
> Sashiko is written in Rust from scratch, mostly using Gemini CLI. It's
> fully self-contained and does not rely on any CLI coding tools. It
> supports various LLMs (at this moment mostly tested with Gemini
> Pro/Flash and slightly with Claude).
>
> And finally it's fully open-source:
> * https://github.com/sashiko-dev/sashiko

Thanks for this! All much appreciated.

>
> It's licensed under the Apache-2.0 License, and the ownership of the
> project was transferred to the Linux Foundation. Contributions are
> really welcome using DCO.
>
> Sashiko is based on a set of open-source prompts initially developed by
> Chris Mason:
> * https://github.com/masoncl/review-prompts/
>
> But Sashiko leverages a different multi-stage review protocol, which
> somewhat mimics the human review process and forces the LLM to look at
> the proposed change from different angles.
>
> In my measurement, Sashiko was able to find 53% of bugs based
> on a completely unfiltered set of 1,000 recent upstream issues using
> "Fixes:" tags (using Gemini 3.1 Pro). Some might say that 53% is not
> that impressive, but 100% of these issues were missed by human reviewers.
> Also, many of these issues (like tricky build failures, performance
> problems, etc) are very hard/impossible to spot from reviewing the code,
> so arguably 100% is not reachable. We started with low 30's a couple of
> months ago; better models and improvements in the review protocol and
> subsystem prompts pushed it to low 50's. With better LLMs and collective
> effort on prompts we can push even further.
>
> Measuring false positives is much harder, but based on manual reviews of
> reviews, it's pretty good: it's rarely dead wrong, but sometimes it can
> nitpick or find too many low-value issues. In many cases, it can be
> improved with prompt engineering.

So far I've noticed it has got quite a bit wrong, not quite 'dead wrong' but
just very confused :)

So for me, compared to Chris's prompts running through Claude it's
producing a lot more noise, but it's also producing some useful results.

So I think it's not quite good enough for integrating into anything
email-wise yet, but it's definitely very useful as an additional tool.

(For one I'm going to go fix some bugs on my series I saw reported there).

I think over time as the approach/model is refined this will get a LOT
better, it seems these things can acelerate quickly.

>
> * What's next?
>
> This is our first version and it's obviously not perfect. There is a
> long list of fixes and improvements to make. Please, don't expect it to
> be 100% reliable, even though we'll try hard to keep it up and running.
> Please use github issues or email me any bug reports and feature
> requests, or send PR's.

Of course, it's all much appreicated!

>
> As of now, Sashiko only provides a web interface;
> however, Konstantin Ryabitsev is already adding sashiko.dev support to b4,
> and SeongJae Park is adding support to hkml.
> That was really fast, thank you!

Thanks to Konstantantin and SJ too but the web interface is pretty nice I
must say so thanks for that! :)

>
> We're working on adding an email interface to Sashiko, and soon Sashiko
> will be able to send out reviews over email - similar to what the bpf
> subsystem already has. It will be opt-in by subsystem and will have options

Like I said, I think it's a bit premature for mm at least _at this point_
but I'm sure it'll get there.

For now I think we need to get the false positive rate down a fair bit
otherwise it might be a little distracitng.

But people are _already_ integrating the web interface into workflows, I
check it now, and Andrew is already very keen :) see:

https://lore.kernel.org/all/20260317121736.f73a828de2a989d1a07efea1@xxxxxxxxxxxxxxxxxxxx/
https://lore.kernel.org/all/20260317113730.45d5cef4ba84be4df631677f@xxxxxxxxxxxxxxxxxxxx/

> to CC only the author of the patch, maintainers, volunteers, or send a
> fully public reply. If you're a maintainer and have a strong preference
> to get reviews over email, please let me know.

Well as maintainer I think 'not quite yet' but probably soon is the answer
on that one!

>
> We also desperately need better benchmarks, especially when it comes to
> false positives. Having a decent vetted set of officially perfect
> commits can help with this.

Not sure perfect commits exist in the kernel certainly not mine :P

>
> Finally, some subsystems have a good prompts coverage and some don't. It
> doesn't have to be lengthy documentation (and it might actually be
> counter-productive), but having a small list of things to look at - some
> high-level concepts which are hard to grasp from the code, etc. - can
> help a lot with both bug discovery and false positives.

I guess best contributed to Chris's review-prompts repo right?

>
> Thanks,
> Roman

Cheers, Lorenzo