Re: [Intel-wired-lan] 6.10; regression; bisected - commit 86167183a17e cause info msg "trying to register non-static key"

From: Sasha Neftin
Date: Tue May 21 2024 - 01:34:51 EST


On 18/05/2024 0:05, Vinicius Costa Gomes wrote:
Hi,

+ intel-wired-lan

Mikhail Gavrilov <mikhail.v.gavrilov@xxxxxxxxx> writes:
Hello Corinna,

Any thought? Revert?

Thanks,
Sasha

Hi,
I am here to test unstable kernels.
Yesterday at Fedora Rawhide arrived the new kernel
20240515git1b294a1f3561 and I spotter in kernel log new "red" message
with follow stacktrace:
[ 8.471610] nvme nvme1: pci function 0000:0e:00.0
[ 8.471616] nvme nvme0: pci function 0000:04:00.0
[ 8.492638] nvme nvme1: 32/0/0 default/read/poll queues
[ 8.496190] nvme nvme0: 31/0/0 default/read/poll queues
[ 8.507051] nvme0n1: p1 p2 p3
[ 8.522270] INFO: trying to register non-static key.
[ 8.522273] The code is fine but needs lockdep annotation, or maybe
[ 8.522274] you didn't initialize this object before use?
[ 8.522275] turning off the locking correctness validator.
[ 8.522276] CPU: 31 PID: 683 Comm: (udev-worker) Not tainted
6.10.0-0.rc0.20240515git1b294a1f3561.4.fc41.x86_64+debug #1
[ 8.522278] Hardware name: ASUS System Product Name/ROG STRIX
B650E-I GAMING WIFI, BIOS 2611 04/07/2024
[ 8.522280] Call Trace:
[ 8.522281] <TASK>
[ 8.522282] dump_stack_lvl+0x84/0xd0
[ 8.522287] register_lock_class+0xd84/0x1000
[ 8.522291] ? __pfx_register_lock_class+0x10/0x10
[ 8.522293] __lock_acquire+0x3d2/0x5c70
[ 8.522295] ? debug_object_free+0x298/0x550
[ 8.522297] ? __pfx_debug_object_free+0x10/0x10
[ 8.522299] ? __pfx_lock_release+0x10/0x10
[ 8.522301] ? __pfx___lock_acquire+0x10/0x10
[ 8.522302] ? hrtimer_try_to_cancel+0x22/0x460
[ 8.522305] lock_acquire+0x1ae/0x540
[ 8.522307] ? igc_ptp_clear_tx_tstamp+0x38/0x1b0 [igc]
[ 8.522321] ? __pfx_lock_acquire+0x10/0x10
[ 8.522323] ? seqcount_lockdep_reader_access.constprop.0+0xa5/0xb0
[ 8.522325] ? lockdep_hardirqs_on+0x7c/0x100
[ 8.522326] ? seqcount_lockdep_reader_access.constprop.0+0xa5/0xb0
[ 8.522329] _raw_spin_lock_irqsave+0x51/0xa0
[ 8.522330] ? igc_ptp_clear_tx_tstamp+0x38/0x1b0 [igc]
[ 8.522337] igc_ptp_clear_tx_tstamp+0x38/0x1b0 [igc]
[ 8.522344] igc_ptp_set_timestamp_mode+0x3cc/0x700 [igc]
[ 8.522350] ? igc_power_down_phy_copper+0xf1/0x140 [igc]
[ 8.522358] igc_ptp_reset+0x3b/0x5e0 [igc]
[ 8.530693] ? igc_set_eee_i225+0xfd/0x1e0 [igc]
[ 8.530706] igc_reset+0x2d9/0x3d0 [igc]
[ 8.531707] igc_probe+0x14ca/0x1e20 [igc]
[ 8.531715] ? _raw_spin_unlock_irqrestore+0x4f/0x80
[ 8.531717] ? __pfx_igc_probe+0x10/0x10 [igc]
[ 8.531723] local_pci_probe+0xdc/0x180
[ 8.531727] pci_device_probe+0x23c/0x810
[ 8.531729] ? kernfs_add_one+0x3ab/0x4a0
[ 8.534068] ? kernfs_new_node+0x13d/0x240
[ 8.534070] ? __pfx_pci_device_probe+0x10/0x10
[ 8.534072] ? kernfs_create_link+0x16e/0x240
[ 8.534074] ? kernfs_put+0x1c/0x40
[ 8.534076] ? sysfs_do_create_link_sd+0x8e/0x100
[ 8.534078] really_probe+0x1e0/0x8a0
[ 8.536433] __driver_probe_device+0x18c/0x370
[ 8.536436] driver_probe_device+0x4a/0x120
[ 8.536438] __driver_attach+0x194/0x4a0
[ 8.536439] ? __pfx___driver_attach+0x10/0x10
[ 8.536441] bus_for_each_dev+0x106/0x190
[ 8.536443] ? __pfx_bus_for_each_dev+0x10/0x10
[ 8.536445] bus_add_driver+0x2ff/0x530
[ 8.536448] driver_register+0x1a5/0x360
[ 8.536449] ? __pfx_igc_init_module+0x10/0x10 [igc]
[ 8.536456] do_one_initcall+0xd6/0x460
[ 8.536459] ? __pfx_do_one_initcall+0x10/0x10
[ 8.536461] ? kasan_unpoison+0x44/0x70
[ 8.536464] do_init_module+0x296/0x7c0
[ 8.536466] load_module+0x567b/0x74f0
[ 8.536470] ? __pfx_load_module+0x10/0x10
[ 8.536473] ? __might_fault+0x9d/0x120
[ 8.536475] ? local_clock_noinstr+0xd/0x100
[ 8.536478] ? __pfx___might_resched+0x10/0x10
[ 8.536481] ? __do_sys_init_module+0x1ef/0x220
[ 8.536482] __do_sys_init_module+0x1ef/0x220
[ 8.536483] ? __pfx___do_sys_init_module+0x10/0x10
[ 8.536487] do_syscall_64+0x97/0x190
[ 8.536490] ? lockdep_hardirqs_on_prepare+0x171/0x400
[ 8.536492] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 8.536494] RIP: 0033:0x7f29997265ae
[ 8.536499] Code: 48 8b 0d 85 a8 0c 00 f7 d8 64 89 01 48 83 c8 ff
c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 52 a8 0c 00 f7 d8 64 89
01 48

It seems I have missed this following flow (in reverse):

+ igc_ptp_clear_tx_tstamp
+ igc_ptp_disable_tx_timestamp
+ igc_ptp_set_timestamp_mode
+ igc_ptp_reset
+ igc_reset
+ igc_probe

And that in igc_ptp_clear_tx_tstamp(), it uses ->ptp_tx_lock. So
igc_ptp_init() needs to be called before igc_reset().

I think that is what this splat is telling us.

Don't know what's the best way forward, reverting the commit in question
or trying to fix the initial problem some other way?

[ 8.536500] RSP: 002b:00007ffee505bff8 EFLAGS: 00000246 ORIG_RAX:
00000000000000af
[ 8.536502] RAX: ffffffffffffffda RBX: 000055eafe4a2ac0 RCX: 00007f29997265ae
[ 8.536503] RDX: 00007f2998e6007d RSI: 00000000001067d6 RDI: 00007f299828c010
[ 8.536504] RBP: 00007ffee505c0b0 R08: 000055eafe43a010 R09: 0000000000000007
[ 8.536505] R10: 0000000000000007 R11: 0000000000000246 R12: 00007f2998e6007d
[ 8.536506] R13: 0000000000020000 R14: 000055eafe476280 R15: 000055eafe4a25a0
[ 8.536508] </TASK>
[ 8.542741] pps pps0: new PPS source ptp0
[ 8.550932] igc 0000:0a:00.0 eth0: PHC added
[ 8.550992] igc 0000:0a:00.0: 4.000 Gb/s available PCIe bandwidth
(5.0 GT/s PCIe x1 link)
[ 8.550996] igc 0000:0a:00.0 eth0: MAC: e8:9c:25:6c:40:75
[ 8.558160] igc 0000:0a:00.0 eno1: renamed from eth0

Of course, I immediately wanted to find the first bad commit.
And now it has already been found:
86167183a17e03ec77198897975e9fdfbd53cb0b is the first bad commit
commit 86167183a17e03ec77198897975e9fdfbd53cb0b (HEAD)
Author: Corinna Vinschen <vinschen@xxxxxxxxxx>
Date: Tue Apr 23 12:24:54 2024 +0200

igc: fix a log entry using uninitialized netdev

During successful probe, igc logs this:

[ 5.133667] igc 0000:01:00.0 (unnamed net_device)
(uninitialized): PHC added
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The reason is that igc_ptp_init() is called very early, even before
register_netdev() has been called. So the netdev_info() call works
on a partially uninitialized netdev.

Fix this by calling igc_ptp_init() after register_netdev(), right
after the media autosense check, just as in igb. Add a comment,
just as in igb.

Now the log message is fine:

[ 5.200987] igc 0000:01:00.0 eth0: PHC added

Signed-off-by: Corinna Vinschen <vinschen@xxxxxxxxxx>
Reviewed-by: Hariprasad Kelam <hkelam@xxxxxxxxxxx>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@xxxxxxxxx>
Tested-by: Naama Meir <naamax.meir@xxxxxxxxxxxxxxx>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@xxxxxxxxx>

drivers/net/ethernet/intel/igc/igc_main.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

The next step I was convinced that reverting the commit 86167183a17e
removed this info message.

Thank you for the effort.


I also attach here a full kernel log and build config.

My hardware specs: https://linux-hardware.org/?probe=98ecbf3636

I hope it helps.

--
Best Regards,
Mike Gavrilov.


Cheers,