Re: Double free of struct sk_buff reported by SLAB_CONSISTENCY_CHECKS with init_on_free

From: Vlastimil Babka
Date: Tue Nov 05 2019 - 10:02:29 EST


On 11/5/19 3:32 PM, Thibaut Sautereau wrote:
> On Tue, Nov 05, 2019 at 10:00:39AM +0100, Vlastimil Babka wrote:
>> On 11/4/19 6:03 PM, Thibaut Sautereau wrote:
>>> The BUG only happens when using `slub_debug=F` on the command-line (to
>>> enable SLAB_CONSISTENCY_CHECKS), otherwise the double free is not
>>> reported and the system keeps running.
>>
>> You could change slub_debug parameter to:
>> slub_debug=FU,skbuff_head_cache
>>
>> That will also print out who previously allocated and freed the double
>> freed object. And limit all the tracking just to the affected cache.
>
> Thanks, I did not know about that.
>
> However, as kind of expected, I get a BUG due to a NULL pointer
> dereference in print_track():

Ah, I didn't read properly your initial mail, that there's a null
pointer deference during the consistency check.

...

>>>
>>> Bisection points to the following commit: 1b7e816fc80e ("mm: slub: Fix
>>> slab walking for init_on_free"), and indeed the BUG is not triggered
>>> when init_on_free is disabled.
>>
>> That could be either buggy SLUB code, or the commit somehow exposed a
>> real bug in skbuff users.
>
> Right. At first I thought about some incompatibility between
> init_on_free and SLAB_CONSISTENCY_CHECKS, but in that case why would it
> only happen with skbuff_head_cache?

That's curious, yeah.

> On the other hand, if it's a bug in
> skbuff users, why is the on_freelist() check in free_consistency_check()
> not detecting anything when init_on_free is disabled?

I vaguely suspect the code in the commit 1b7e816fc80e you bisected,
where in slab_free_freelist_hook() in the first iteration, we have void
*p = NULL; and set_freepointer(s, object, p); will thus write NULL into
the freelist. Is is the NULL we are crashing on? The code seems to
assume that the freelist is rewritten later in the function, but that
part is only active with some CONFIG_ option(s), none of which might be
enabled in your case?
But I don't really understand what exactly this function is supposed to
do. Laura, does my theory make sense?

Thanks,
Vlastimil