Re: v5.13-rcX regression - NULL pointer dereference - MFD and software node API

From: Andy Shevchenko
Date: Sun Jun 20 2021 - 07:20:43 EST


On Sun, Jun 20, 2021 at 11:36 AM Dominik Brodowski
<linux@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> Over a month ago, Andy Shevchenko reported and fixed a NULL pointer
> dereference issue introduced by commit
> 42e59982917a ("mfd: core: Add support for software nodes")
> in v5.13-rc1:
> https://lore.kernel.org/lkml/20210510141552.57045-1-andriy.shevchenko@xxxxxxxxxxxxxxx/
>
> A bisect shows that it is indeed commit 42e59982917a which causes boot to
> fail due to a NULL pointer dereference on my work laptop,

Can you, please, be more specific? E.g. where may I find the ACPI dump
of your laptop, along with other information?
What you may prepare is (all run under root user)
1. `acpidump -o laptop-$MODEL.dat` (the *.dat file)
2. `grep -H 15 /sys/bus/acpi/devices/*/status`
3. `dmesg`
4. `cat /proc/iomem /proc/ioport`
5. `lspci -nk -vv`

(#2 and #3 are interesting to have in working and non-working cases)

Perhaps a bug on the kernel bugzilla would be a good container for all these.

Also it's not clear what exactly an Oops you have (I don't believe
it's the same).

> where "intel-lpss"
> is bound to
> 00:15.0 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0 (rev 21)
> and fails to bind to INT3446:

Yeah, this is confusing (see above for additional information needed).

> [ 6.048087] intel-lpss 0000:00:15.0: enabling device (0000 -> 0002)
> [ 6.050625] idma64 idma64.0: Found Intel integrated DMA 64-bit
> [ 6.109112] intel-lpss 0000:00:15.1: enabling device (0000 -> 0002)
> [ 6.111348] idma64 idma64.1: Found Intel integrated DMA 64-bit
> [ 6.172229] intel-lpss 0000:00:15.2: enabling device (0000 -> 0002)
> [ 6.174353] idma64 idma64.2: Found Intel integrated DMA 64-bit
> [ 6.231865] intel-lpss 0000:00:15.3: enabling device (0000 -> 0002)
> [ 6.233845] idma64 idma64.3: Found Intel integrated DMA 64-bit
> [ 6.287492] ACPI Warning: SystemMemory range 0x00000000FE028000-0x00000000FE0281FF conflicts with OpRegion 0x00000000FE028000-0x00000000FE028207 (\_SB.PCI0.GEXP.BAR0) (20210331/utaddress-204)
> [ 6.287704] ACPI: OSL: Resource conflict; ACPI support missing from driver?
> [ 6.289760] intel-lpss: probe of INT3446:00 failed with error -16
>
> Unfortunately, the patch by Andy Shevchenko (applied on top of Linus' tree)
> does not fix the issue. A complete revert, however, does fix the issue, and
> allows my laptop to boot again.

The problem my patch fixed (besides logical issues) was to work around
_buggy_ ACPI table. If anything, I guess the firmware is to blame for
this, but let's see the actual data before judging and getting the
right course of action.

> In my opinion, it is unfortunate that although it has been known for over a
> month that commit 42e59982917a is broken, the bugfix (though probably not
> far-reaching enough) has not yet progressed upstream.

Which sounds like a narrow scope of the issue and supports the theory
of buggy tables. It may also be possible that some driver

--
With Best Regards,
Andy Shevchenko