[PATCH 0/1] Add a script to check for kernel-doc regressions

From: Mauro Carvalho Chehab

Date: Thu Mar 26 2026 - 12:26:37 EST

Hi Jon,

I've using this script internally to check for regressions and
changes with kernel-doc, specially those related to the new
CTokenizer code:

$ tools/docs/kdoc_diff --help
usage: kdoc_diff [-h] [--full] [--regression] [--work-dir WORK_DIR] [--clean] commits [files ...]

Compare kernel documentation between commits

positional arguments:
commits commit range like old..new
files files to process – if supplied the --full flag is ignored

options:
-h, --help show this help message and exit
--full, -f Force a full scan of Documentation/*
--regression, -r Use YAML format to check for regressions
--work-dir, -w WORK_DIR
work dir (default: /new_devel/docs)
--clean, -c Clean caches

I did today a cleanup, to be able to submit it, as I think it could
be helpful to you and others as well, as it automates the diff check
between two commits.

It has two modes of work:

1. It generates 3 files: err.log, man.log, rst.log and does
a diff between old/new commit.

On this mode, it sorts err.log and remove duplicated messages,
so it relaxes a little bit the diff comparision, if a minor
change affects its error output.

2. It uses yaml to run regressions test.

The regressions mode is nice when no regressions are expected. It
uses the tools/unittest/test_kdoc_parser, which is somewhat relaxed
with regards to trivial changes like whitespaces.

The tested files can either be:

a. Partial: only files explicitly included via kernel-doc:: markups
inside Documentation;

b. Full: includes files with broken kernel-doc markups that are all
spread inside Kernel tree;

c. A list of files or directories.

To prevent losing anything, before running, it checks if the tree
is not dirty. While running, it does git checkout -f, and, at the
end, it returns to the current branch.

There's a logic there which catches signals to avoid troubles on
errors/exit/ctrl-c. At least on my tests, it worked fine even
on python errors inside the script. Yet, in case of troubles,
one could use git reflog.

---

With regards with tools usage:

The v0.1 skeleton was originally written via LLM (gpt-oss), but the
code was almost entirely rewritten by hand. It was also checked with
pylint and I re-checked it again with another LLM (nemotron2-cascade),
all executed on my local machine, without Internet access enabled
on ollama.

LLM prototyping was interesting to have a quick start code but, as
expected, LLM output it not anything better than any other traditional
auto-complete/auto-generated code method: the produced code was
complex, with lots of caveats and hidden issues. Yet, using LLM for
some specific tasks (like for instance to write a signal handler) is
usually faster than googling at the Internet. Also, it helps to review
the logic, as it can point to some problems, but its output requires
one with enough knowledge to discard bad code and ignore AI hallucinations.

Mauro Carvalho Chehab (1):
docs: kdoc_diff: add a helper tool to help checking kdoc regressions

tools/docs/kdoc_diff | 504 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 504 insertions(+)
create mode 100755 tools/docs/kdoc_diff

--
2.52.0