Re: [PATCH v2] mm/slub: Avoid list corruption when removing a slab from the full list

From: Vlastimil Babka
Date: Tue Oct 08 2024 - 06:25:42 EST


On 10/7/24 17:03, Hyeonggon Yoo wrote:
> On Mon, Oct 7, 2024 at 11:29 PM Vlastimil Babka <vbabka@xxxxxxx> wrote:
>>
>> On 10/7/24 11:18 AM, yuan.gao wrote:
>> > boot with slub_debug=UFPZ.
>> >
>> > If allocated object failed in alloc_consistency_checks, all objects of
>> > the slab will be marked as used, and then the slab will be removed from
>> > the partial list.
>> >
>> > When an object belonging to the slab got freed later, the remove_full()
>> > function is called. Because the slab is neither on the partial list nor
>> > on the full list, it eventually lead to a list corruption.
>> >
>> > So we need to add the slab to full list in this case.
>> >
>> > [ 4277.385669] list_del corruption, ffffea00044b3e50->next is LIST_POISON1 (dead000000000100)
>> > [ 4277.387023] ------------[ cut here ]------------
>> > [ 4277.387880] kernel BUG at lib/list_debug.c:56!
>> > [ 4277.388680] invalid opcode: 0000 [#1] PREEMPT SMP PTI
>> > [ 4277.389562] CPU: 5 PID: 90 Comm: kworker/5:1 Kdump: loaded Tainted: G OE 6.6.1-1 #1
>> > [ 4277.392113] Workqueue: xfs-inodegc/vda1 xfs_inodegc_worker [xfs]
>> > [ 4277.393551] RIP: 0010:__list_del_entry_valid_or_report+0x7b/0xc0
>> > [ 4277.394518] Code: 48 91 82 e8 37 f9 9a ff 0f 0b 48 89 fe 48 c7 c7 28 49 91 82 e8 26 f9 9a ff 0f 0b 48 89 fe 48 c7 c7 58 49 91
>> > [ 4277.397292] RSP: 0018:ffffc90000333b38 EFLAGS: 00010082
>> > [ 4277.398202] RAX: 000000000000004e RBX: ffffea00044b3e50 RCX: 0000000000000000
>> > [ 4277.399340] RDX: 0000000000000002 RSI: ffffffff828f8715 RDI: 00000000ffffffff
>> > [ 4277.400545] RBP: ffffea00044b3e40 R08: 0000000000000000 R09: ffffc900003339f0
>> > [ 4277.401710] R10: 0000000000000003 R11: ffffffff82d44088 R12: ffff888112cf9910
>> > [ 4277.402887] R13: 0000000000000001 R14: 0000000000000001 R15: ffff8881000424c0
>> > [ 4277.404049] FS: 0000000000000000(0000) GS:ffff88842fd40000(0000) knlGS:0000000000000000
>> > [ 4277.405357] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > [ 4277.406389] CR2: 00007f2ad0b24000 CR3: 0000000102a3a006 CR4: 00000000007706e0
>> > [ 4277.407589] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> > [ 4277.408780] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> > [ 4277.410000] PKRU: 55555554
>> > [ 4277.410645] Call Trace:
>> > [ 4277.411234] <TASK>
>> > [ 4277.411777] ? die+0x32/0x80
>> > [ 4277.412439] ? do_trap+0xd6/0x100
>> > [ 4277.413150] ? __list_del_entry_valid_or_report+0x7b/0xc0
>> > [ 4277.414158] ? do_error_trap+0x6a/0x90
>> > [ 4277.414948] ? __list_del_entry_valid_or_report+0x7b/0xc0
>> > [ 4277.415915] ? exc_invalid_op+0x4c/0x60
>> > [ 4277.416710] ? __list_del_entry_valid_or_report+0x7b/0xc0
>> > [ 4277.417675] ? asm_exc_invalid_op+0x16/0x20
>> > [ 4277.418482] ? __list_del_entry_valid_or_report+0x7b/0xc0
>> > [ 4277.419466] ? __list_del_entry_valid_or_report+0x7b/0xc0
>> > [ 4277.420410] free_to_partial_list+0x515/0x5e0
>> > [ 4277.421242] ? xfs_iext_remove+0x41a/0xa10 [xfs]
>> > [ 4277.422298] xfs_iext_remove+0x41a/0xa10 [xfs]
>> > [ 4277.423316] ? xfs_inodegc_worker+0xb4/0x1a0 [xfs]
>> > [ 4277.424383] xfs_bmap_del_extent_delay+0x4fe/0x7d0 [xfs]
>> > [ 4277.425490] __xfs_bunmapi+0x50d/0x840 [xfs]
>> > [ 4277.426445] xfs_itruncate_extents_flags+0x13a/0x490 [xfs]
>> > [ 4277.427553] xfs_inactive_truncate+0xa3/0x120 [xfs]
>> > [ 4277.428567] xfs_inactive+0x22d/0x290 [xfs]
>> > [ 4277.429500] xfs_inodegc_worker+0xb4/0x1a0 [xfs]
>> > [ 4277.430479] process_one_work+0x171/0x340
>> > [ 4277.431227] worker_thread+0x277/0x390
>> > [ 4277.431962] ? __pfx_worker_thread+0x10/0x10
>> > [ 4277.432752] kthread+0xf0/0x120
>> > [ 4277.433382] ? __pfx_kthread+0x10/0x10
>> > [ 4277.434134] ret_from_fork+0x2d/0x50
>> > [ 4277.434837] ? __pfx_kthread+0x10/0x10
>> > [ 4277.435566] ret_from_fork_asm+0x1b/0x30
>> > [ 4277.436280] </TASK>
>> >
>> > v2:
>> > * Call remove_partial() and add_full() only for slab folios.
>> >
>> > v1:
>> > https://lore.kernel.org/linux-mm/20241006044755.79634-1-yuan.gao@xxxxxxxxx/
>> >
>> > Signed-off-by: yuan.gao <yuan.gao@xxxxxxxxx>
>>
>> Is it possible to determine which commit introduced this issue, for a
>> stable cc?
>
> By code inspection I suspect it's around when SLUB's first introduced in 2007,
> more specifically commit 643b113849d8 ("slub: enable tracking of full slabs").
> Even v2.6 kernels do not seem to handle this case correctly.
>
>> Also in addition to what Hyeonggon proposed, we should perhaps mark
>> these consistency-failed slabs in a way that further freeing of objects
>> in them will also don't actually free anything, and thus not move the
>> slab back from full to partial list for further reuse.
>
> Yeah I was thinking of that too.
>
> If that is feasible Yuan you may use one bit from (in mm/slab.h)
> struct slab's 'inuse' field
> and change it to 15 bits to mark consistency-failed slabs.
>
> IIUC 'inuse' doesn't have to be 16 bits and 'objects' is already 15
> bits, so I think it should be fine.

AFAICS we are in a rather comfortable position for the debug caches which
avoid all the fastpaths, and we could simply reuse the frozen bit or invent
a special value for slab->freelist?

Either way it should boil down to free_debug_processing() detecting such a
slab (or perhaps in check_slab()) and returning false/0 immediately which
would avoid further actions on the slab.

If it was done in check_slab() then free_debug_processing() would at least
report which of the objects in a broken slab were attempted to be freed
after it was marked broken, which could perhaps be useful:

slab_fix(s, "Object at 0x%p not freed", object);

Vlastimil

>> Right now the
>> (understandable) attempt to not use the corrupted slab further is only
>> partially successful.
>
> Best,
> Hyeonggon
>
>> > ---
>> > mm/slub.c | 5 ++++-
>> > 1 file changed, 4 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/mm/slub.c b/mm/slub.c
>> > index 21f71cb6cc06..2992388c00f4 100644
>> > --- a/mm/slub.c
>> > +++ b/mm/slub.c
>> > @@ -2745,7 +2745,10 @@ static void *alloc_single_from_partial(struct kmem_cache *s,
>> > slab->inuse++;
>> >
>> > if (!alloc_debug_processing(s, slab, object, orig_size)) {
>> > - remove_partial(n, slab);
>> > + if (folio_test_slab(slab_folio(slab))) {
>> > + remove_partial(n, slab);
>> > + add_full(s, n, slab);
>> > + }
>> > return NULL;
>> > }
>> >