Re: [PATCH v1] docs: bug-bisect: rewrite to better match the other bisecting text

From: Thorsten Leemhuis
Date: Tue Aug 13 2024 - 02:37:36 EST


On 07.08.24 20:37, Jonathan Corbet wrote:
> Thorsten Leemhuis <linux@xxxxxxxxxxxxx> writes:
>
>> Rewrite the short document on bisecting kernel bugs. The new text
>> improves .config handling, brings a mention of 'git skip', and explains
>> what to do after the bisection finished -- including trying a revert to
>> verify the result. The rewrite at the same time removes the unrelated
>> and outdated section on 'Devices not appearing' and replaces some
>> sentences about bug reporting with a pointer to the document covering
>> that topic in detail.
>>
>> This overall brings the approach close to the one in the recently added
>> text Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst.
>> As those two texts serve a similar purpose for different audiences,
>> mention that document in the head of this one and outline when the
>> other might be the better one to follow.
>>
>> Signed-off-by: Thorsten Leemhuis <linux@xxxxxxxxxxxxx>
>> ---
>> Documentation/admin-guide/bug-bisect.rst | 205 +++++++++++++++--------
>> MAINTAINERS | 1 +
>> 2 files changed, 135 insertions(+), 71 deletions(-)
>
> So overall this seems like a good thing to do.

Nice to hear that. :-D

> I wouldn't be me if I didn't have some comments though...:)

And I think that is right, as some of the feedback made me go "what did
I do there, Jonathan is right there". Guess that happens to all of us. :-D

>> diff --git a/Documentation/admin-guide/bug-bisect.rst b/Documentation/admin-guide/bug-bisect.rst
>> index 325c5d0ed34a0a..f4a9acab65d0f5 100644
>> --- a/Documentation/admin-guide/bug-bisect.rst
>> +++ b/Documentation/admin-guide/bug-bisect.rst
>> @@ -1,76 +1,139 @@
>
> [...]
>
>> +===============
>> +Bisecting a bug

Side note: While revisiting this I concluded that I most likely will do
a s/bug/regression/ in the title, as that seems more appropriate. But I
will leave the file name as it is to not break existing links.

>> +===============
>>
>> +This document describes how to find a change causing a kernel regression using
>> +``git bisect``.
>
> This seems a bit terse. A bit of information on when doing a bisection
> makes sense would not go amiss - when somebody has observed that a
> previously working feature broke with an update.

Changed that to "This document describes how to use a Git bisection to
find the source code change that broke something -- for example when
some functionality stopped working after upgrading from Linux 6.0 to 6.1."

>> -.. [#f1] You can, optionally, provide both good and bad arguments at git
>> - start with ``git bisect start [BAD] [GOOD]``
>> +The text focuses on the gist of the process. If you are new to bisecting the
>> +kernel, better follow Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst
>> +instead: it depicts everything from start to finish while covering multiple
>> +aspects even kernel developers occasionally forget. This includes:
>>
>> -For further references, please read:
>> +- Detecting situations where a bisections would be a waste of time, as nobody
>> + would care about the result -- for example, because the problem is triggered
>> + by a .config change, was already fixed, is caused by something your Linux
>> + distributor changed, occurs in an abandoned version, or happens after the
>> + kernel marked itself as 'tainted'.
>> +- Preparing the .config file using an appropriate kernel while enabling or
>> + disabling debug symbols depending on the situation's needs -- while optionally
>> + trimming the .config to tremendously reduce the build time per bisection step.
>> +- For regressions in stable or longterm kernels: checking mainline as well, as
>> + the result determines to whom the regression must be reported to.
>
> This instead seems a bit verbose; I'd be tempted to take out the
> itemized list and leave just the paragraph above.

Hmmm. You have a point, but I'd like to have some warning sign along the
lines of "if you are not careful, you will waste your time with this"
indicator somewhere. That's why I for now settled with this:

"""The text focuses on the gist of the process. If you are new to
bisecting the kernel, better follow
Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst
instead: it depicts everything from start to finish while covering
multiple aspects even kernel developers occasionally forget. This
includes detecting situations beforehand where a bisection would be a
waste of time, as nobody would care about the result -- for example,
because the problem happens after the kernel marked itself as 'tainted',
occurs in an abandoned version, was already fixed, or is caused by a
.config change you or your Linux distributor performed."""

>> -- The man page for ``git-bisect``
>> -- `Fighting regressions with git bisect <https://www.kernel.org/pub/software/scm/git/docs/git-bisect-lk2009.html>`_
>> -- `Fully automated bisecting with "git bisect run" <https://lwn.net/Articles/317154>`_
>> -- `Using Git bisect to figure out when brokenness was introduced <http://webchick.net/node/99>`_
>> +Neither document describes how to report a regression, as that is covered by
>> +Documentation/admin-guide/reporting-issues.rst.
>
> This could maybe go at the end - "once you've found the regression, see
> this document on how to report it" ?

Done, this is now this at the end:

"""With that the process is complete. Now report the regression as
described by Documentation/admin-guide/reporting-issues.rst."""

>> +Finding the change causing a kernel issue using a bisection
>> +===========================================================
>> +
>> +*Note: the following process assumes you prepared everything for a bisection;
>> +this includes having a Git clone with the appropriate sources, installing the
>> +software required to build and install kernels, as well as a .config file stored
>> +in a safe place (the following example assumes '~/prepared_kernel_.config') to
>> +use as pristine base at each bisection step.*
>> +
>> +* Preparation: start the bisection and tell Git about the points in the history
>> + you consider to be working and broken, which Git calls 'good' and 'bad'::
>> +
>> + git bisect start
>> + git bisect good v6.0
>> + git bisect bad v6.1
>> +
>> + Instead of Git tags like 'v6.0' and 'v6.1' you can specify commit-ids, too.
>> +
>> +1. Copy your prepared .config into the build directory and adjust it to the
>> + needs of the codebase Git checked out for testing::
>> +
>> + cp ~/prepared_kernel_.config .config
>> + make olddefconfig
>> +
>> +2. Now build, install, and boot a kernel; if any of this fails for unrelated
>> + reasons, run ``git bisect skip`` and go back to step 1.
>
> Spell out "unrelated reasons" a bit more thoroughly? I'd mention that
> things can go wrong (despite our best efforts) when bisect lands in the
> middle of a patch series, and to recognize a failure that is not the bug
> in question.

This now looks like this in my current draft:

"""2. Now build, install, and boot a kernel. This might fail for
unrelated reasons, for example, when a compile error happens at the
current stage of the bisection a later change resolves. In such cases
run ``git bisect skip`` and go back to step 1."""

>> +3. Check if the feature that regressed works in the kernel you just built.
>> +
>> + If it does, execute::
>> +
>> + git bisect good
>> +
>> + If it does not, run::
>> +
>> + git bisect bad
>> +
>> + Be sure what you tell Git is correct, as getting this wrong just once will
>> + send the rest of the bisection totally off course.
>
> Something about hard-to-trigger bugs and putting in the effort to be
> sure that a good kernel is really good? Along those lines, maybe
> something at the top about having a well-defined reproducer for the
> problem would be good.

Added a """You ideally want to have worked out a straight-forward and
fully reliable way to reproduce the regression, too.""" to the note at
the start and changed this area to:

"""Note, getting this wrong just once will send the rest of the
bisection totally off course. To prevent having to start anew later you
thus want to ensure what you tell Git is correct -- it is thus often
wise to spend a few minutes more on testing when your reproducer is
somewhat unreliable."""


>> + Go back to back to step 1, if Git after issuing one of those commands checks
>> + out another bisection point while printing something like 'Bisecting:
>> + 675 revisions left to test after this (roughly 10 steps)'.
>> +
>> + You finished the bisection and move to the next point below, if Git instead
>> + prints something like 'cafecaca0c0dacafecaca0c0dacafecaca0c0da is the first
>> + bad commit'; right afterwards it will show some details about the culprit
>> + including its patch description.
>
> That's a big and hard-to-parse sentence. I'd start with the "if"
> condition - this isn't perl :)

I really wanted to have the "where to go next" aspect in that sentence.
But yes, I guess I tried to hard. Changed this to:

"""You have finished the bisection, if Git instead prints something like
'cafecaca0c0dacafecaca0c0dacafecaca0c0da is the first bad commit'. In
that case move to the next point below. Note, right after displaying
that line Git will show some details about the culprit including its
patch description; this can easily fill your terminal, so you might need
to scroll up to see the message mentioning the culprit's commit-id.

In case you are missed Git's output, you at any time of the bisection
can run ``git bisect log`` to print the status: it will show how many
steps remains or mention the result of the bisection."""

>> +* Recommended complementary task: put the bisection log and the current
>> + .config file aside for the bug report; furthermore tell Git to reset the
>> + sources to the state before the bisection::
>> +
>> + git bisect log > ~/bisection-log
>> + cp .config ~/bisection-config-culprit
>> + git bisect reset
>> +
>> +* Recommended optional task: try reverting the culprit on top of the latest
>> + codebase; if successful, this will validate your bisection and enable
>> + developers to resolve the regression through a revert.
>
> "successful" could be misinterpreted as referring to the revert itself
> here. An explicit "if that fixes the bug..." would be more clear.

Now reads """Recommended optional task: try reverting the culprit on top
of the latest codebase and check if that fixes your bug; if that is the
case, it validates the bisection and enables developers to resolve the
regression through a revert."""

> [...]
>
> Thanks for doing this.

Many thx for the feedback, much appreciated.

Ciao, Thorsten