Re: [PATCH] [v2] Documentation: Provide guidelines for tool-generated content

From: NeilBrown

Date: Tue Nov 11 2025 - 18:45:34 EST


On Thu, 06 Nov 2025, Dave Hansen wrote:
> In the last few years, the capabilities of coding tools have exploded.
> As those capabilities have expanded, contributors and maintainers have
> more and more questions about how and when to apply those
> capabilities.
>
> The shiny new AI tools (chatbots, coding assistants and more) are
> impressive. Add new Documentation to guide contributors on how to
> best use kernel development tools, new and old.
>
> Note, though, there are fundamentally no new or unique rules in this
> new document. It clarifies expectations that the kernel community has
> had for many years. For example, researchers are already asked to
> disclose the tools they use to find issues in
> Documentation/process/researcher-guidelines.rst. This new document
> just reiterates existing best practices for development tooling.
>
> In short: Please show your work and make sure your contribution is
> easy to review.
>
> Signed-off-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
> Cc: Dan Williams <dan.j.williams@xxxxxxxxx>
> Cc: Theodore Ts'o <tytso@xxxxxxx>
> Cc: Sasha Levin <sashal@xxxxxxxxxx>
> Cc: Jonathan Corbet <corbet@xxxxxxx>
> Cc: Kees Cook <kees@xxxxxxxxxx>
> Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> Cc: Miguel Ojeda <ojeda@xxxxxxxxxx>
> Cc: Shuah Khan <shuah@xxxxxxxxxx>
>
> --
>
> This document was a collaborative effort from all the members of
> the TAB. I just reformatted it into .rst and wrote the changelog.
>
> Changes from v1:
> * Rename to generated-content.rst and add to documentation index.
> (Jon)
> * Rework subject to align with the new filename
> * Replace commercial names with generic ones. (Jon)
> * Be consistent about punctuation at the end of bullets for whole
> sentences. (Miguel)
> * Formatting sprucing up and minor typos (Miguel)
> ---
> Documentation/process/generated-content.rst | 94 +++++++++++++++++++++
> Documentation/process/index.rst | 1 +
> 2 files changed, 95 insertions(+)
> create mode 100644 Documentation/process/generated-content.rst
>
> diff --git a/Documentation/process/generated-content.rst b/Documentation/process/generated-content.rst
> new file mode 100644
> index 0000000000000..5e8ff44190932
> --- /dev/null
> +++ b/Documentation/process/generated-content.rst
> @@ -0,0 +1,94 @@
> +============================================
> +Kernel Guidelines for Tool Generated Content
> +============================================
> +
> +Purpose
> +=======
> +
> +Kernel contributors have been using tooling to generate contributions
> +for a long time. These tools are constantly becoming more capable and
> +undoubtedly improve developer productivity. At the same time, reviewer
> +and maintainer bandwidth is a very scarce resource. Understanding
> +which portions of a contribution come from humans versus tools is
> +critical to maintain those resources and keep kernel development
> +healthy.

"critical"? Really?
If I use "sed" to create a patch, it might be helpful for you to know
that, but not critical.

> +
> +The goal here is to clarify community expectations around tools. This
> +lets everyone become more productive while also maintaining high
> +degrees of trust between submitters and reviewers.

I like the mention of trust. I think that is a foundational issue here.
I wonder if it could be emphasised more.

> +
> +Out of Scope
> +============
> +
> +These guidelines do not apply to tools that make trivial tweaks to
> +preexisting content. Nor do they pertain to AI tooling that helps with
> +menial tasks. Some examples:
> +
> + - Spelling and grammar fix ups, like rephrasing to imperative voice
> + - Typing aids like identifier completion, common boilerplate or
> + trivial pattern completion
> + - Purely mechanical transformations like variable renaming
> + - Reformatting, like running Lindent, ``clang-format`` or
> + ``rust-fmt``
> +
> +Even if your tool use is out of scope you should still always consider
> +if it would help reviewing your contribution if the reviewer knows
> +about the tool that you used.

I also like the focus on "helping the reviewer". That too is
foundational.

When submitting a patch you will benefit from efforts to build trust
with the maintainer, and anything you do to help the review process
run smoothly will help your patch get accepted. Some guidelines for
how this applies with respect to tool-assisted patch creation, and LLM
(aka AI) in particular, are below.
??

> +
> +In Scope
> +========
> +
> +These guidelines apply when a meaningful amount of content in a kernel
> +contribution was not written by a person in the Signed-off-by chain,
> +but was instead created by a tool.
> +
> +Detection of a problem is also a part of the development process; if a
> +tool was used to find a problem addressed by a change, that should be
> +noted in the changelog. This not only gives credit where it is due, it
> +also helps fellow developers find out about these tools.

I'm not sure about "Credit where it is due". We stand on the shoulders
of giants and listing all those giants could become burdensome.
I'm not sure we have a duty to help fellows find out about tools, though
that is certainly a valuable side-effect (If you find a tool useful,
write about it for lwn :-)

I think keeping the focus on helping the reviewer is best. Knowing the
motivation of a patch is often useful, and that can usefully include
details of tools used.

> +
> +Some examples:
> + - Any tool-suggested fix such as ``checkpatch.pl --fix``
> + - Coccinelle scripts
> + - A chatbot generated a new function in your patch to sort list entries.
> + - A .c file in the patch was originally generated by a LLM but cleaned
> + up by hand.
> + - The changelog was generated by handing the patch to a generative AI
> + tool and asking it to write the changelog.
> + - The changelog was translated from another language.
> +
> +If in doubt, choose transparency and assume these guidelines apply to
> +your contribution.

"choose transparency" is a good default.

> +
> +Guidelines
> +==========
> +
> +First, read the Developer's Certificate of Origin:
> +Documentation/process/submitting-patches.rst . Its rules are simple
> +and have been in place for a long time. They have covered many
> +tool-generated contributions.
> +
> +Second, when making a contribution, be transparent about the origin of
> +content in cover letters and changelogs. You can be more transparent
> +by adding information like this:
> +
> + - What tools were used?
> + - The input to the tools you used, like the coccinelle source script.
> + - If code was largely generated from a single or short set of
> + prompts, include those prompts in the commit log. For longer
> + sessions, include a summary of the prompts and the nature of
> + resulting assistance.
> + - Which portions of the content were affected by that tool?
> +
> +As with all contributions, individual maintainers have discretion to
> +choose how they handle the contribution. For example, they might:
> +
> + - Treat it just like any other contribution
> + - Reject it outright

If I find that a maintainer might reject my tool-based submission
outright, I might be tempted towards less transparency.
I don't think we should legitimise this sort of response by listing it
here.

> + - Review the contribution with extra scrutiny
> + - Suggest a better prompt instead of suggesting specific code changes
> + - Ask for some other special steps, like asking the contributor to
> + elaborate on how the tool or model was trained
> + - Ask the submitter to explain in more detail about the contribution
> + so that the maintainer can feel comfortable that the submitter fully
> + understands how the code works.

This last point contains an important idea that I think could be
highlighted more. The submitter needs to understand, and be able to
defend, the submission; both its motivation and its content.
"A tool generated the patch" is useful information but never an excuse
for not understanding it.

Thanks,
NeilBrown