Re: [regression] nct6775 does not load in 5.4 and 5.5, bisected to b84398d6d7f90080
From: Martin Volf
Date: Sat Feb 22 2020 - 12:56:07 EST
Hello,
On Sat, Feb 22, 2020 at 4:41 PM Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
> On 2/22/20 3:13 AM, Martin Volf wrote:
> > hardware monitoring sensors NCT6796D on my Asus PRIME Z390M-PLUS
> > motherboard with Intel i7-9700 CPU don't work with 5.4 and newer linux
> > kernels, the driver nct6775 does not load.
> >
> > It is working OK in version 5.3. I have used almost all released stable
> > versions from 5.3.8 to 5.3.16; I didn't try older kernels.
...
> My wild guess would be that the i801 driver is a bit aggressive with
> reserving memory spaces, but I don't immediately see what it does
> differently in that regard after the offending patch. Does it work
> if you unload the i2c_i801 driver first ?
Yes, after unloading i2c_i801, the nct6775 works. There is definitely
some sort of race between these two drivers. Mostly i2c_i801 wins, but it
happened twice that nct6775 got automatically loaded just before i2c_i801
and the sensors worked.
> You could also try to compare the output of /proc/ioports with
> the old and the new kernel, and see if the IO address space used
> by nct6775 in v5.3 is assigned to the i801 driver (or some other
> driver, such as the watchdog driver) in v5.4.
This is diff of /proc/ioports in 5.3.18 with loaded nct6775 and in
5.4.21 without:
--- ioports-5.3.18
+++ ioports-5.4.21
@@ -2,6 +2,7 @@
0000-001f : dma1
0020-0021 : pic1
002e-0031 : iTCO_wdt
+ 002e-0031 : iTCO_wdt
0040-0043 : timer0
0050-0053 : timer1
0060-0060 : keyboard
@@ -14,11 +15,10 @@
00f0-00ff : fpu
00f0-00f0 : PNP0C04:00
0290-029f : pnp 00:01
- 0295-0296 : nct6775
- 0295-0296 : nct6775
03c0-03df : vga+
03f8-03ff : serial
0400-041f : iTCO_wdt
+ 0400-041f : iTCO_wdt
0680-069f : pnp 00:03
0cf8-0cff : PCI conf1
0d00-ffff : PCI Bus 0000:00
> If you are into hacking the kernel, you could also add some
> debug messages into the nct6775 driver to find out where exactly
> it fails. If that helps, maybe we can then add those messages into
> into the driver source to help others if this is observed again.
I have added some pr_info calls, the diff is at the and of this massage.
"bad" dmesg (i.e. i2c_i801 loaded before modprobe nct6775)
[ 1631.975392] nct6775: ### sensors_nct6775_init:
platform_driver_register() -> 0x0
[ 1631.975396] nct6775: ### nct6775_find: superio_enter(0x2e) -> 0xfffffff0
[ 1631.975417] nct6775: ### nct6775_find: superio_enter(0x4e) -> 0x0
[ 1631.975455] nct6775: ### nct6775_find: (val & SIO_ID_MASK) == 0xffff
"good" dmesg (rmmod i2c_i801; modprobe nct6775)
[ 1730.751188] nct6775: ### sensors_nct6775_init:
platform_driver_register() -> 0x0
[ 1730.751213] nct6775: ### nct6775_find: superio_enter(0x2e) -> 0x0
[ 1730.751251] nct6775: ### nct6775_find: (val & SIO_ID_MASK) == 0xd42b
[ 1730.751359] nct6775: Found NCT6798D or compatible chip at 0x2e:0x290
[ 1730.751367] ACPI Warning: SystemIO range
0x0000000000000295-0x0000000000000296 conflicts with OpRegion
0x0000000000000290-0x0000000000000299 (\AMW0.SHWM)
(20190816/utaddress-204)
[ 1730.751379] ACPI: This conflict may cause random problems and
system instability
[ 1730.751381] ACPI: If an ACPI driver is available for this device,
you should use it instead of the native driver
[ 1730.751431] nct6775: ### nct6775_probe: platform_get_resource() -> 0xfdac7b00
[ 1730.751434] nct6775: ### nct6775_probe: devm_request_region() -> 0
[ 1730.751554] nct6775 nct6775.656: Invalid temperature source 28 at
index 0, source register 0x100, temp register 0x73
[ 1730.751588] nct6775 nct6775.656: Invalid temperature source 28 at
index 1, source register 0x200, temp register 0x75
[ 1730.751686] nct6775 nct6775.656: Invalid temperature source 28 at
index 4, source register 0x900, temp register 0x7b
[ 1730.751865] nct6775: ### nct6775_probe: superio_enter(0x2e) -> 0x0
[ 1730.753685] nct6775: ### nct6775_find: superio_enter(0x4e) -> 0x0
[ 1730.753722] nct6775: ### nct6775_find: (val & SIO_ID_MASK) == 0xffff
So 0x2e is the resource the two drivers are fighting for.
I have created /etc/modprobe.d/nct6775-before-i2c_i801.conf with
install i2c_i801 /sbin/modprobe nct6775; /sbin/modprobe
--ignore-install i2c_i801
and it is working. I'm OK with this workaround, but I can do more
experiments if you instruct me what to try.
Thanks,
Martin
--8<--
--- nct6775.c.orig
+++ nct6775.c
@@ -3806,10 +3806,13 @@ static int nct6775_probe(struct platform
int num_attr_groups = 0;
res = platform_get_resource(pdev, IORESOURCE_IO, 0);
+pr_info("### nct6775_probe: platform_get_resource() -> 0x%x\n", res);
if (!devm_request_region(&pdev->dev, res->start, IOREGION_LENGTH,
DRVNAME))
return -EBUSY;
+pr_info("### nct6775_probe: devm_request_region() -> 0");
+
data = devm_kzalloc(&pdev->dev, sizeof(struct nct6775_data),
GFP_KERNEL);
if (!data)
@@ -4318,6 +4321,7 @@ static int nct6775_probe(struct platform
break;
default:
+pr_info("### nct6775_probe: data->kind == 0x%x\n", data->kind);
return -ENODEV;
}
data->have_in = BIT(data->in_num) - 1;
@@ -4503,6 +4507,7 @@ static int nct6775_probe(struct platform
nct6775_init_device(data);
err = superio_enter(sio_data->sioreg);
+pr_info("### nct6775_probe: superio_enter(0x%x) -> 0x%x\n",
sio_data->sioreg, err);
if (err)
return err;
@@ -4729,6 +4734,7 @@ static int __init nct6775_find(int sioad
int addr;
err = superio_enter(sioaddr);
+pr_info("### nct6775_find: superio_enter(0x%x) -> 0x%x\n", sioaddr, err);
if (err)
return err;
@@ -4737,6 +4743,7 @@ static int __init nct6775_find(int sioad
if (force_id && val != 0xffff)
val = force_id;
+pr_info("### nct6775_find: (val & SIO_ID_MASK) == 0x%04x\n", val);
switch (val & SIO_ID_MASK) {
case SIO_NCT6106_ID:
sio_data->kind = nct6106;
@@ -4831,6 +4838,7 @@ static int __init sensors_nct6775_init(v
int sioaddr[2] = { 0x2e, 0x4e };
err = platform_driver_register(&nct6775_driver);
+pr_info("### sensors_nct6775_init: platform_driver_register() -> 0x%x\n", err);
if (err)
return err;