[2/5] reporting-issues: step-by-step-guide: main and two sub-processes for stable/longterm
From: Thorsten Leemhuis
Date: Fri Mar 26 2021 - 02:17:43 EST
On 26.03.21 07:13, Thorsten Leemhuis wrote:
> Lo! Since a few months mainline in
> Documentation/admin-guide/reporting-issues.rst contains a text written
> to obsolete the good old reporting-bugs text. For now, the new document
> still contains a warning at the top that basically says "this is WIP".
> But I'd like to remove that warning and delete reporting-bugs.rst in the
> next merge window to make reporting-issues.rst fully official. With this
> mail I want to give everyone a chance to take a look at the text and
> speak up if you don't want me to move ahead for now.
>
> For easier review I'll post the text of reporting-issues.rst in reply to
> this mail. I'll do that in a few chunks, as if this was a cover letter
> for a patch-set.
Step-by-step guide how to report issues to the kernel maintainers
=================================================================
The above TL;DR outlines roughly how to report issues to the Linux kernel
developers. It might be all that's needed for people already familiar with
reporting issues to Free/Libre & Open Source Software (FLOSS) projects. For
everyone else there is this section. It is more detailed and uses a
step-by-step approach. It still tries to be brief for readability and leaves
out a lot of details; those are described below the step-by-step guide in a
reference section, which explains each of the steps in more detail.
Note: this section covers a few more aspects than the TL;DR and does things in
a slightly different order. That's in your interest, to make sure you notice
early if an issue that looks like a Linux kernel problem is actually caused by
something else. These steps thus help to ensure the time you invest in this
process won't feel wasted in the end:
* Are you facing an issue with a Linux kernel a hardware or software vendor
provided? Then in almost all cases you are better off to stop reading this
document and reporting the issue to your vendor instead, unless you are
willing to install the latest Linux version yourself. Be aware the latter
will often be needed anyway to hunt down and fix issues.
* Perform a rough search for existing reports with your favorite internet
search engine; additionally, check the archives of the Linux Kernel Mailing
List (LKML). If you find matching reports, join the discussion instead of
sending a new one.
* See if the issue you are dealing with qualifies as regression, security
issue, or a really severe problem: those are 'issues of high priority' that
need special handling in some steps that are about to follow.
* Make sure it's not the kernel's surroundings that are causing the issue
you face.
* Create a fresh backup and put system repair and restore tools at hand.
* Ensure your system does not enhance its kernels by building additional
kernel modules on-the-fly, which solutions like DKMS might be doing locally
without your knowledge.
* Check if your kernel was 'tainted' when the issue occurred, as the event
that made the kernel set this flag might be causing the issue you face.
* Write down coarsely how to reproduce the issue. If you deal with multiple
issues at once, create separate notes for each of them and make sure they
work independently on a freshly booted system. That's needed, as each issue
needs to get reported to the kernel developers separately, unless they are
strongly entangled.
* If you are facing a regression within a stable or longterm version line
(say something broke when updating from 5.10.4 to 5.10.5), scroll down to
'Dealing with regressions within a stable and longterm kernel line'.
* Locate the driver or kernel subsystem that seems to be causing the issue.
Find out how and where its developers expect reports. Note: most of the
time this won't be bugzilla.kernel.org, as issues typically need to be sent
by mail to a maintainer and a public mailing list.
* Search the archives of the bug tracker or mailing list in question
thoroughly for reports that might match your issue. If you find anything,
join the discussion instead of sending a new report.
After these preparations you'll now enter the main part:
* Unless you are already running the latest 'mainline' Linux kernel, better
go and install it for the reporting process. Testing and reporting with
the latest 'stable' Linux can be an acceptable alternative in some
situations; during the merge window that actually might be even the best
approach, but in that development phase it can be an even better idea to
suspend your efforts for a few days anyway. Whatever version you choose,
ideally use a 'vanilla' build. Ignoring these advices will dramatically
increase the risk your report will be rejected or ignored.
* Ensure the kernel you just installed does not 'taint' itself when
running.
* Reproduce the issue with the kernel you just installed. If it doesn't show
up there, scroll down to the instructions for issues only happening with
stable and longterm kernels.
* Optimize your notes: try to find and write the most straightforward way to
reproduce your issue. Make sure the end result has all the important
details, and at the same time is easy to read and understand for others
that hear about it for the first time. And if you learned something in this
process, consider searching again for existing reports about the issue.
* If your failure involves a 'panic', 'Oops', 'warning', or 'BUG', consider
decoding the kernel log to find the line of code that triggered the error.
* If your problem is a regression, try to narrow down when the issue was
introduced as much as possible.
* Start to compile the report by writing a detailed description about the
issue. Always mention a few things: the latest kernel version you installed
for reproducing, the Linux Distribution used, and your notes on how to
reproduce the issue. Ideally, make the kernel's build configuration
(.config) and the output from ``dmesg`` available somewhere on the net and
link to it. Include or upload all other information that might be relevant,
like the output/screenshot of an Oops or the output from ``lspci``. Once
you wrote this main part, insert a normal length paragraph on top of it
outlining the issue and the impact quickly. On top of this add one sentence
that briefly describes the problem and gets people to read on. Now give the
thing a descriptive title or subject that yet again is shorter. Then you're
ready to send or file the report like the MAINTAINERS file told you, unless
you are dealing with one of those 'issues of high priority': they need
special care which is explained in 'Special handling for high priority
issues' below.
* Wait for reactions and keep the thing rolling until you can accept the
outcome in one way or the other. Thus react publicly and in a timely manner
to any inquiries. Test proposed fixes. Do proactive testing: retest with at
least every first release candidate (RC) of a new mainline version and
report your results. Send friendly reminders if things stall. And try to
help yourself, if you don't get any help or if it's unsatisfying.
Reporting regressions within a stable and longterm kernel line
--------------------------------------------------------------
This subsection is for you, if you followed above process and got sent here at
the point about regression within a stable or longterm kernel version line. You
face one of those if something breaks when updating from 5.10.4 to 5.10.5 (a
switch from 5.9.15 to 5.10.5 does not qualify). The developers want to fix such
regressions as quickly as possible, hence there is a streamlined process to
report them:
* Check if the kernel developers still maintain the Linux kernel version
line you care about: go to the front page of kernel.org and make sure it
mentions the latest release of the particular version line without an
'[EOL]' tag.
* Check the archives of the Linux stable mailing list for existing reports.
* Install the latest release from the particular version line as a vanilla
kernel. Ensure this kernel is not tainted and still shows the problem, as
the issue might have already been fixed there.
* Send a short problem report by mail to the people and mailing lists the
:ref:`MAINTAINERS <maintainers>` file specifies in the section 'STABLE
BRANCH'. Roughly describe the issue and ideally explain how to reproduce
it. Mention the first version that shows the problem and the last version
that's working fine. Then wait for further instructions.
The reference section below explains each of these steps in more detail.
Reporting issues only occurring in older kernel version lines
-------------------------------------------------------------
This subsection is for you, if you tried the latest mainline kernel as outlined
above, but failed to reproduce your issue there; at the same time you want to
see the issue fixed in older version lines or a vendor kernel that's regularly
rebased on new stable or longterm releases. If that case follow these steps:
* Prepare yourself for the possibility that going through the next few steps
might not get the issue solved in older releases: the fix might be too big
or risky to get backported there.
* Perform the first three steps in the section "Dealing with regressions
within a stable and longterm kernel line" above.
* Search the Linux kernel version control system for the change that fixed
the issue in mainline, as its commit message might tell you if the fix is
scheduled for backporting already. If you don't find anything that way,
search the appropriate mailing lists for posts that discuss such an issue
or peer-review possible fixes; then check the discussions if the fix was
deemed unsuitable for backporting. If backporting was not considered at
all, join the newest discussion, asking if it's in the cards.
* One of the former steps should lead to a solution. If that doesn't work
out, ask the maintainers for the subsystem that seems to be causing the
issue for advice; CC the mailing list for the particular subsystem as well
as the stable mailing list.
The reference section below explains each of these steps in more detail.