Re: Planned changes for to reduce the "Bugzilla blues"

From: Takashi Iwai
Date: Sun Oct 02 2022 - 03:37:27 EST

On Sat, 01 Oct 2022 12:30:22 +0200,
Artem S. Tashkinov wrote:
> Here are two other issues which absolutely suck in terms of dealing with
> the kernel.
> - 1 -
> I have a 20+ years experience in IT and some kernel issues are just
> baffling in terms of trying to understand what to do about them.
> Here's an example:
> What should I do about that? Who's responsible for this? Who should I CC?
> And this is an issue which is easy to describe and identify.

IMO, this indicates one of the big problems of bugzilla -- or a bug
tracker in general -- with the complete lack of screening.

An initial bug report is sent only to the bug assignees of the given
component, and those are mostly destined to persons (usually
maintainers), not to a public ML or group. That doesn't work nor
scale for lots of bug reports. We need screening at the first place,
before maintainers try to take a deeper look.

One may change the default target of the bugzilla assignee to a ML,
too. However, this leads to sending lots of noises from unqualified
bug reports straightly to ML, which shall upset developers, so it's no
better choice.

And, screening is a tiresome task; you'd have to deal sometimes with
people have no clue and no etiquette. I understand many companies
trying to deploy AI for that place...

> - 2 -
> Here's another one which is outright puzzling:
> You run: dmesg -t --level=emerg,crit,err
> And you see some non-descript errors of some kernel subsystems seemingly
> failing or being unhappy about your hardware. Errors are as cryptic as
> humanly possible, you don't even know what part of kernel has produced them.
> OK, as a "power" user I download the kernel source, run `grep -R message
> /tmp/linux-5.19` and there are _multiple_ different modules and places
> which contain this message.
> I'm lost. Send this to LKML? Did that in the long past, no one cared, I
> stopped.
> Here's what I'm getting with Linux 5.19.12:
> platform wdat_wdt: failed to claim resource 5: [mem
> 0x00000000-0xffffffff7fffffff]
> ACPI: watchdog: Device creation failed: -16
> ACPI BIOS Error (bug): Could not resolve symbol
> [\_SB.PCI0.XHC.RHUB.TPLD], AE_NOT_FOUND (20220331/psargs-330)
> ACPI Error: Aborting method \_SB.UBTC.CR01._PLD due to previous error
> (AE_NOT_FOUND) (20220331/psparse-529)
> platform MSFT0101:00: failed to claim resource 1: [mem
> 0xfed40000-0xfed40fff]
> acpi MSFT0101:00: platform device creation failed: -16
> lis3lv02d: unknown sensor type 0x0
> Are they serious? Should they be reported or not? Is my laptop properly
> working? I have no clue at all.

That's a dilemma. The kernel can't know whether it's "properly"
working, either -- that is, whether the lack of some functions matters
for you or not. In your case above, it's about a watchdog, something
related with USB, TPM, and acceleration sensor, all of which likely
come from a buggy BIOS. Would you mind if those features are missing?
Or even whether your device has a correct hardware implementation?
Kernel doesn't know, hence it complains as an error.

In many drivers, there are mechanisms to shut off superfluous error
messages for known devices. So it's case-by-case solutions.

Or you can completely hide those errors at boot by a boot option
(e.g. loglevel=2).