Re: [PATCH v2 00/28] kernel-doc: use a C lexical tokenizer for transforms

From: Mauro Carvalho Chehab

Date: Fri Mar 13 2026 - 05:20:34 EST

Hi Jon,

On Thu, 12 Mar 2026 15:54:20 +0100
Mauro Carvalho Chehab <mchehab+huawei@xxxxxxxxxx> wrote:

> Also, I didn't notice any relevant change on the documentation build
> time.

After more tests, I actually noticed an issue after this changeset:

https://lore.kernel.org/linux-doc/2b957decdb6cedab4268f71a166c25b7abdb9a61.1773326442.git.mchehab+huawei@xxxxxxxxxx/

Basically, a broken kernel-doc like this:

/**
* enum dmub_abm_ace_curve_type - ACE curve type.
*/
enum dmub_abm_ace_curve_type {
/**
* ACE curve as defined by the SW layer.
*/
ABM_ACE_CURVE_TYPE__SW = 0,
/**
* ACE curve as defined by the SW to HW translation interface layer.
*/
ABM_ACE_CURVE_TYPE__SW_IF = 1,
};

where the inlined markups don't have "@symbol" doesn't parse well. If
you run current kernel-doc, it would produce:

.. c:enum:: dmub_abm_ace_curve_type

ACE curve type.

.. container:: kernelindent

**Constants**

``*/ ABM_ACE_CURVE_TYPE__SW = 0``
*undescribed*

`` */ ABM_ACE_CURVE_TYPE__SW_IF = 1``
*undescribed*

Because Kernel-doc currently drops the "/**" line. My fix patch
above fixes it, but inlined comments confuse enum/struct detection.
To avoid that, we need to strip comments earlier at dump_struct and
dump_enum:

https://lore.kernel.org/linux-doc/d112804ace83e0ad8496f687977596bb7f091560.1773390831.git.mchehab+huawei@xxxxxxxxxx/T/#u

After such fix, the output is now:

.. c:enum:: dmub_abm_ace_curve_type

ACE curve type.

.. container:: kernelindent

**Constants**

``ABM_ACE_CURVE_TYPE__SW``
*undescribed*

``ABM_ACE_CURVE_TYPE__SW_IF``
*undescribed*

which is the result expected when there's no proper inlined
kernel-doc markups.

Due to this issue, I ended adding a 29/28 patch on this series.

> With that regards, right now, every time a CMatch replacement
> rule takes in place, it does:
>
> for each transform:
> - tokenizes the source code;
> - handle CMatch;
> - convert tokens back to a string.
>
> A possible optimization would be to do, instead:
>
> - tokenizes source code;
> - for each transform handle CMatch;
> - convert tokens back to a string.
>
> For now, I opted not do do it, because:
>
> - too much changes on a single row;
> - docs build time is taking ~3:30 minutes, which is
> about the same time it ws taken before the changes;
> - there is a very dirty hack inside function_xforms:
> (KernRe(r"_noprof"), ""). This is meant to change
> function prototypes instead of function arguments.
>
> So, if ok for you, I would prefer to merge this one first. We can later
> optimize kdoc_parser to avoid multiple token <-> string conversions.

I did such optimization and it worked fine. So, I ended adding
a 30/28 patch at the end. With that, running kernel-doc before/after
the entire series won't have significant performance changes.

# Current approach
$ time ./scripts/kernel-doc . -man >original 2>&1

real 0m37.344s
user 0m36.447s
sys 0m0.712s

# Tokenizer running multiple times (patch 29)
$ time ./scripts/kernel-doc . -man >before 2>&1

real 1m32.427s
user 1m25.377s
sys 0m1.293s

# After optimization (patch 30)
$ time ./scripts/kernel-doc . -man >after 2>&1

real 0m47.094s
user 0m46.106s
sys 0m0.751s

10 seconds slower than before when parsing everything, which affects
make mandocs, but the time differences spent at kernel-doc parser during
make htmldocs is minimal: ir is about ~4 seconds(*):

$ run_kdoc.py -none 2>/dev/null
Checking what files are currently used on documentation...
Running kernel-doc

Elapsed time: 0:00:04.348008

(*) the slowest logic when building docs with Sphinx is inside its
RST parser code.

See the enclosed script to see how I measured the parsing time for
existing ".. kernel-doc::" markups inside Documentation.

Thanks,
Mauro

---

This is the run_kdoc.py script I'm using here to pick the same files
as make htmldocs do:

#!/bin/env python3

import os
import re
import subprocess
import sys

from datetime import datetime
from glob import glob

print("Checking what files are currently used on documentation...")

kdoc_files = set()
re_kernel_doc = re.compile(r"^\.\.\s+kernel-doc::\s*(\S+)")

for fname in glob(os.path.join(".", "**"), recursive=True):
if os.path.isfile(fname) and fname.endswith(".rst"):
with open(fname, "r", encoding="utf-8") as in_fp:
data = in_fp.read()

for line in data.split("\n"):
match = re_kernel_doc.match(line)
if match:
if os.path.isfile(match.group(1)):
kdoc_files.add(match.group(1))

if not kdoc_files:
sys.exit(f"Directory doesn't contain kernel-doc tags")

cmd = [ "./tools/docs/kernel-doc" ]
cmd += sys.argv[1:]
cmd += sorted(kdoc_files)

print("Running kernel-doc")

start_time = datetime.now()

try:
result = subprocess.run(cmd, check=True)
except subprocess.CalledProcessError as e:
print(f"kernel-doc failed: {repr(e)}")

elapsed = datetime.now() - start_time
print(f"\nElapsed time: {elapsed}")