Re: [PATCH v3 2/2] slub: Introduce CONFIG_SLUB_RCU_DEBUG

From: Jann Horn
Date: Mon Jul 29 2024 - 05:36:21 EST


On Mon, Jul 29, 2024 at 6:37 AM kernel test robot <oliver.sang@xxxxxxxxx> wrote:
> kernel test robot noticed "WARNING:possible_circular_locking_dependency_detected" on:
>
> commit: 17049be0e1bcf0aa8809faf84f3ddd8529cd6c4c ("[PATCH v3 2/2] slub: Introduce CONFIG_SLUB_RCU_DEBUG")
> url: https://github.com/intel-lab-lkp/linux/commits/Jann-Horn/kasan-catch-invalid-free-before-SLUB-reinitializes-the-object/20240726-045709
> patch link: https://lore.kernel.org/all/20240725-kasan-tsbrcu-v3-2-51c92f8f1101@xxxxxxxxxx/
> patch subject: [PATCH v3 2/2] slub: Introduce CONFIG_SLUB_RCU_DEBUG
[...]
> [ 136.014616][ C1] WARNING: possible circular locking dependency detected

Looking at the linked dmesg, the primary thing that actually went
wrong here is something in the SLUB bulk freeing code, we got multiple
messages like:

```
BUG filp (Not tainted): Bulk free expected 1 objects but found 2

-----------------------------------------------------------------------------

Slab 0xffffea0005251f00 objects=23 used=23 fp=0x0000000000000000
flags=0x8000000000000040(head|zone=2)
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.10.0-00002-g17049be0e1bc #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl+0xa3/0x100
slab_err+0x15a/0x200
free_to_partial_list+0x2c9/0x600
[...]
slab_free_after_rcu_debug+0x169/0x280
[...]
rcu_do_batch+0x4a4/0xc40
rcu_core+0x36e/0x5c0
handle_softirqs+0x211/0x800
[...]
__irq_exit_rcu+0x71/0x100
irq_exit_rcu+0x5/0x80
sysvec_apic_timer_interrupt+0x68/0x80
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x16/0x40
RIP: 0010:default_idle+0xb/0x40
Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90 90 eb 07 0f 00 2d 17 ae 32 00 fb f4 <fa> c3
cc cc cc cc cc 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00
RSP: 0018:ffff888104e5feb8 EFLAGS: 00200282
RAX: 4c16e5d04752e300 RBX: ffffffff813578df RCX: 0000000000995661
RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffffffff813578df
RBP: 0000000000000001 R08: ffff8883aebf6cdb R09: 1ffff11075d7ed9b
R10: dffffc0000000000 R11: ffffed1075d7ed9c R12: 0000000000000000
R13: 1ffff110209ca008 R14: ffffffff87474e68 R15: dffffc0000000000
? do_idle+0x15f/0x400
default_idle_call+0x6e/0x100
do_idle+0x15f/0x400
cpu_startup_entry+0x40/0x80
start_secondary+0x129/0x180
common_startup_64+0x129/0x1a7
</TASK>
FIX filp: Object at 0xffff88814947e400 not freed
```

Ah, the issue is that I'm NULL as the tail pointer to do_slab_free()
instead of passing in the pointer to the object again. That's the
result of not being careful enough while forward-porting my patch from
last year, it conflicted with vbabka's commit 284f17ac13fe ("mm/slub:
handle bulk and single object freeing separately")... I'll fix that up
in the next version.


I don't think the lockdep warning is caused by code I introduced, it's
just that you can only hit that warning when SLUB does printk...

> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240729/202407291014.2ead1e72-oliver.sang@xxxxxxxxx