Re: [linus:master] [module] 8660484ed1: WARNING:at_kernel/module/dups.c:#kmod_dup_request_exists_wait

From: Yujie Liu
Date: Wed May 24 2023 - 04:18:55 EST


Hi Luis,

On Tue, May 23, 2023 at 10:16:17PM -0700, Luis Chamberlain wrote:
> On Mon, May 22, 2023 at 03:51:59PM +0800, kernel test robot wrote:
> > Hello,
> >
> > kernel test robot noticed "WARNING:at_kernel/module/dups.c:#kmod_dup_request_exists_wait" on:
> >
> > commit: 8660484ed1cf3261e89e0bad94c6395597e87599 ("module: add debugging auto-load duplicate module support")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > [test failed on linux-next/master 798d276b39e984345d52b933a900a71fa0815928]
> >
> > in testcase: boot
> >
> > compiler: gcc-11
> > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> >
> > (please refer to attached dmesg/kmsg for entire log/backtrace)
> >
> >
> > We are not sure if this is expected as this patch is for debugging
> > duplicate module requests issues, so we are sending this report for
> > your information. Thanks.
> >
> >
> > If you fix the issue, kindly add following tag
> > | Reported-by: kernel test robot <yujie.liu@xxxxxxxxx>
> > | Closes: https://lore.kernel.org/oe-lkp/202305221508.753c7b78-yujie.liu@xxxxxxxxx
> >
> >
> > [ 48.705567][ T1] ------------[ cut here ]------------
> > [ 48.706519][ T1] module-autoload: duplicate request for module intel_qat
> > [ 48.707866][ T1] WARNING: CPU: 0 PID: 1 at kernel/module/dups.c:183 kmod_dup_request_exists_wait (kernel/module/dups.c:183 (discriminator 1))
>
> This is a real issue since CONFIG_MODULE_DEBUG_AUTOLOAD_DUPS_TRACE was enabled.
> If you enable CONFIG_MODULE_DEBUG_AUTOLOAD_DUPS_TRACE and these warnings
> come up a bette detailed report would be better. In this case the goal
> was to capture races of request_module() and so the idea for developers
> is to identify the module that caused this, in this case intel_qat, and
> then see who called the request_module() for it. They could try things
> like try_module_get(), but could also ensure that multiple requests
> won't be done in the code by using locking schemes or whatever. Only
> one request should be issued.
>
> The trace below would show possible users but I don't even see
> drivers/crypto/qat/qat_c3xxx/adf_drv.c in my kernel tree.
>
> If you don't to deal with this reporting you can just disable this
> config option.

Thanks a lot for the information. Does this indicate that there is
indeed a multiple requests issue in that crypto driver and we could
report to its author/owner? Thanks.

Not sure if I undertand this correctly: enabling
CONFIG_MODULE_DEBUG_AUTOLOAD_DUPS_TRACE can help detect multiple
requests issues in existing code, so we should report to the original
author of the modules that have this issue, but shouldn't report on
this commit that introduces the CONFIG_MODULE_DEBUG_AUTOLOAD_DUPS_TRACE
option?

--
Best Regards,
Yujie

>
> Luis
>
> > [ 48.709747][ T1] Modules linked in:
> > [ 48.710593][ T1] CPU: 0 PID: 1 Comm: swapper Not tainted 6.3.0-rc1-00121-g8660484ed1cf #1
> > [ 48.712267][ T1] RIP: 0010:kmod_dup_request_exists_wait (kernel/module/dups.c:183 (discriminator 1))
> > [ 48.713599][ T1] Code: c7 80 e3 2a 83 e8 ba 9f fe 00 48 89 ef e8 fe 4c 17 00 80 3d 87 15 0e 02 00 4c 89 e6 74 10 48 c7 c7 20 c0 47 82 e8 d8 ba f2 ff <0f> 0b eb 0c 48 c7 c7 80 c0 47 82 e8 f7 73 fc ff 45 84 f6 75 3a 48
> > All code
> > ========
> > 0: c7 80 e3 2a 83 e8 ba movl $0xfe9fba,-0x177cd51d(%rax)
> > 7: 9f fe 00
> > a: 48 89 ef mov %rbp,%rdi
> > d: e8 fe 4c 17 00 call 0x174d10
> > 12: 80 3d 87 15 0e 02 00 cmpb $0x0,0x20e1587(%rip) # 0x20e15a0
> > 19: 4c 89 e6 mov %r12,%rsi
> > 1c: 74 10 je 0x2e
> > 1e: 48 c7 c7 20 c0 47 82 mov $0xffffffff8247c020,%rdi
> > 25: e8 d8 ba f2 ff call 0xfffffffffff2bb02
> > 2a:* 0f 0b ud2 <-- trapping instruction
> > 2c: eb 0c jmp 0x3a
> > 2e: 48 c7 c7 80 c0 47 82 mov $0xffffffff8247c080,%rdi
> > 35: e8 f7 73 fc ff call 0xfffffffffffc7431
> > 3a: 45 84 f6 test %r14b,%r14b
> > 3d: 75 3a jne 0x79
> > 3f: 48 rex.W
> >
> > Code starting with the faulting instruction
> > ===========================================
> > 0: 0f 0b ud2
> > 2: eb 0c jmp 0x10
> > 4: 48 c7 c7 80 c0 47 82 mov $0xffffffff8247c080,%rdi
> > b: e8 f7 73 fc ff call 0xfffffffffffc7407
> > 10: 45 84 f6 test %r14b,%r14b
> > 13: 75 3a jne 0x4f
> > 15: 48 rex.W
> > [ 48.717441][ T1] RSP: 0000:ffffc9000001fc48 EFLAGS: 00010286
> > [ 48.718679][ T1] RAX: 0000000000000000 RBX: ffffc9000001fcb8 RCX: 0000000000000000
> > [ 48.720263][ T1] RDX: ffffc9000001f96f RSI: 0000000000000008 RDI: fffff52000003f7f
> > [ 48.721791][ T1] RBP: ffff88812ec34000 R08: 0000000000000000 R09: ffffc9000001fa17
> > [ 48.723349][ T1] R10: 0000000000000001 R11: ffffffff81199faf R12: ffffc9000001fd08
> > [ 48.724701][ T1] R13: ffff88812a9cd800 R14: 0000000000000001 R15: 0000000000000001
> > [ 48.726139][ T1] FS: 0000000000000000(0000) GS:ffffffff82cb6000(0000) knlGS:0000000000000000
> > [ 48.727857][ T1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 48.729151][ T1] CR2: ffff88843ffff000 CR3: 0000000002c6d000 CR4: 00000000000406b0
> > [ 48.730703][ T1] Call Trace:
> > [ 48.731336][ T1] <TASK>
> > [ 48.731859][ T1] ? adfdrv_init (drivers/crypto/qat/qat_c3xxx/adf_drv.c:254)
> > [ 48.732687][ T1] __request_module (kernel/module/kmod.c:168)
> > [ 48.733637][ T1] ? free_modprobe_argv (kernel/module/kmod.c:133)
> > [ 48.734681][ T1] ? w1_ds2760_family_init (drivers/power/supply/ds2760_battery.c:801)
> > [ 48.735699][ T1] ? rcu_read_lock_sched_held (kernel/rcu/update.c:125)
> > [ 48.736791][ T1] ? adfdrv_init (drivers/crypto/qat/qat_c62x/adf_drv.c:251)
> > [ 48.737658][ T1] adfdrv_init (drivers/crypto/qat/qat_c3xxx/adf_drv.c:254)
> > [ 48.738518][ T1] do_one_initcall (init/main.c:1306)
> > [ 48.739479][ T1] ? rcu_lock_acquire (kernel/notifier.c:262)
> > [ 48.740637][ T1] ? rcu_read_lock_sched_held (kernel/rcu/update.c:125)
> > [ 48.741731][ T1] ? lock_is_held (include/linux/sched.h:1572)
> > [ 48.742565][ T1] do_initcalls (init/main.c:1378 init/main.c:1395)
> > [ 48.743358][ T1] kernel_init_freeable (init/main.c:1638)
> > [ 48.744281][ T1] ? rest_init (init/main.c:1514)
> > [ 48.745140][ T1] kernel_init (init/main.c:1524)
> > [ 48.745996][ T1] ret_from_fork (arch/x86/entry/entry_64.S:314)
> > [ 48.746959][ T1] </TASK>
> > [ 48.747563][ T1] irq event stamp: 2359969
> > [ 48.748417][ T1] hardirqs last enabled at (2359977): __up_console_sem (arch/x86/include/asm/irqflags.h:26 (discriminator 3) arch/x86/include/asm/irqflags.h:67 (discriminator 3) arch/x86/include/asm/irqflags.h:127 (discriminator 3) kernel/printk/printk.c:345 (discriminator 3))
> > [ 48.750315][ T1] hardirqs last disabled at (2359986): __up_console_sem (kernel/printk/printk.c:343 (discriminator 3))
> > [ 48.752231][ T1] softirqs last enabled at (2359662): __do_softirq (arch/x86/include/asm/preempt.h:27 kernel/softirq.c:415 kernel/softirq.c:600)
> > [ 48.754058][ T1] softirqs last disabled at (2359653): irq_exit (kernel/softirq.c:445 kernel/softirq.c:650 kernel/softirq.c:674)
> > [ 48.755866][ T1] ---[ end trace 0000000000000000 ]---
> > [ 48.757648][ T1] initcall adfdrv_init+0x0/0x45 returned 0 after 52170 usecs
> > [ 48.759132][ T1] calling adfdrv_init+0x0/0x45 @ 1
> >
> >
> > To reproduce:
> >
> > # build kernel
> > cd linux
> > cp config-6.3.0-rc1-00121-g8660484ed1cf .config
> > make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
> > make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
> > cd <mod-install-dir>
> > find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
> >
> >
> > git clone https://github.com/intel/lkp-tests.git
> > cd lkp-tests
> > bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email
> >
> > # if come across any failure that blocks the test,
> > # please remove ~/.lkp and /lkp dir to run from a clean state.
> >
> >
> > --
> > 0-DAY CI Kernel Test Service
> > https://github.com/intel/lkp-tests/wiki