Re: Introduce Sashiko (agentic review of Linux kernel changes)

From: Lorenzo Stoakes (Oracle)

Date: Fri Apr 03 2026 - 08:23:17 EST


On Fri, Apr 03, 2026 at 08:11:30AM -0400, Theodore Tso wrote:
> One other thing to consider is copyright. This issue is one we can
> safely ignore when we are asking LLM's to review code. But if ask
> LLM's to generate documentation, and then we cut and paste the
> generated text into kernel documentation, the copyright status of the
> generated text is not well defined.
>
> In Europe, the European Comission has promulgated that LLM output,
> having been generated by a machine, and not a human being, is not
> copyrighted. If a human being then makes changes, the combined work
> could be subject to copyright, and if it is merged into code that is
> subject to the GPL (for example), the combined work would also be
> subject to the original license. But that's only in Europe.
>
> But consider researchers were able to extract 96% of Harry Potter and
> the Sourcerer's Stone from Claude 3.7 Sonnet. So with the right
> prompt, if we get a paragraph that came from some published book about
> Linux, and it was dropped into the Documentation/ directory, that
> might be problematic, since even (or maybe especially) the European
> Union might want to take a hard line. (Do you hear the people sing,
> singing the songs of angry Victor Hugo's? :-)
>
> If we use an LLM model analyze docuemntation to identify gaps, and we
> take a bullet list of missing functions or semantics, and the human
> being writes new text from scratch, instead of cutting and pasting
> directly from LLM, that should be safe. But of course, I'm not a
> lawyer and I don't play one on TV.

I don't think anybody's suggesting we use LLMs to generate documentation,
at least that's not how I interpreted it?

I'm very much against that, it absolutely requires expert input, and I've
already personally rejected AI slop mm documentation submitted fairly
recently.

Which makes all the above moot.

Frankly overall I've found LLM-generated *anything* to suck. It's good at
finding bugs, debugging splats, quickly looking things up, 'loose form'
search queries effectively, etc.

But the code it produces is god-awful, and the documentation is absolute
trash.

I don't see that changing as the average of everything is always going to
be mediocrity by statistical definition... and that's all it can produce.

>
> Cheers,
>
> - Ted

Thanks, Loreno