Re: Planned changes for bugzilla.kernel.org to reduce the "Bugzilla blues"

From: Artem S. Tashkinov
Date: Sun Oct 02 2022 - 04:23:26 EST




On 10/2/22 07:37, Takashi Iwai wrote:
On Sat, 01 Oct 2022 12:30:22 +0200,
Artem S. Tashkinov wrote:
- 2 -

Here's another one which is outright puzzling:

You run: dmesg -t --level=emerg,crit,err

And you see some non-descript errors of some kernel subsystems seemingly
failing or being unhappy about your hardware. Errors are as cryptic as
humanly possible, you don't even know what part of kernel has produced them.

OK, as a "power" user I download the kernel source, run `grep -R message
/tmp/linux-5.19` and there are _multiple_ different modules and places
which contain this message.

I'm lost. Send this to LKML? Did that in the long past, no one cared, I
stopped.

Here's what I'm getting with Linux 5.19.12:

platform wdat_wdt: failed to claim resource 5: [mem
0x00000000-0xffffffff7fffffff]
ACPI: watchdog: Device creation failed: -16
ACPI BIOS Error (bug): Could not resolve symbol
[\_SB.PCI0.XHC.RHUB.TPLD], AE_NOT_FOUND (20220331/psargs-330)
ACPI Error: Aborting method \_SB.UBTC.CR01._PLD due to previous error
(AE_NOT_FOUND) (20220331/psparse-529)
platform MSFT0101:00: failed to claim resource 1: [mem
0xfed40000-0xfed40fff]
acpi MSFT0101:00: platform device creation failed: -16
lis3lv02d: unknown sensor type 0x0

Are they serious? Should they be reported or not? Is my laptop properly
working? I have no clue at all.

That's a dilemma. The kernel can't know whether it's "properly"
working, either -- that is, whether the lack of some functions matters
for you or not. In your case above, it's about a watchdog, something
related with USB, TPM, and acceleration sensor, all of which likely
come from a buggy BIOS. Would you mind if those features are missing?
Or even whether your device has a correct hardware implementation?
Kernel doesn't know, hence it complains as an error.

In many drivers, there are mechanisms to shut off superfluous error
messages for known devices. So it's case-by-case solutions.

Or you can completely hide those errors at boot by a boot option
(e.g. loglevel=2).

The problem is some of such messages are indeed indicative of certain
real issues which result in HW not working properly, including:

1) missing/incorrect firmware
2) most importantly: not enabled power saving modes
3) not enabled high performance modes
4) not enabled devices
5) not enabled devices' functions
6) drivers conflicts (i.e. the wrong module gets loaded for the device)
7) physically failing hardware

I'm quite sure you don't really know what half of those messages
actually mean.

Speaking of 7. Various kernel subsystems/drivers deal with e.g. mass
storage which is known to fail quite often. There's not a single driver
in the kernel which is actually brave enough to spew something like this:

"/dev/xxxx might be failing, please RMA or seek help online"

instead you get a dmesg choke full of "unable to read sector XXX" or
something like that.

To return to the previous errors: it's impossible for the user to assess
their severity and that sucks. What is "platform device creation
failed"? What is "unknown sensor type"? What am I missing? Who's
responsible? The kernel? My HW vendor? Are those errors actionable? In
my understanding a properly working computer must not produce
"emerg,crit,err" errors. I'm not even talking about "warn,info" and such.

Best regards,
Artem