Re: [PATCH 1/3] mm/slub: fix the race between validate_slab and slab_free

From: Rongwei Wang
Date: Fri Jul 15 2022 - 04:05:43 EST

Next message: Krzysztof Kozlowski: "Re: [PATCH v2 05/19] dt-bindings: power: mediatek: Refine multiple level power domain nodes"
Previous message: Pali Rohár: "Re: [PATCH] arm: ioremap: Fix compilation of code which use pci_remap_iospace() without CONFIG_MMU"
Next in thread: Vlastimil Babka: "Re: [PATCH 1/3] mm/slub: fix the race between validate_slab and slab_free"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 6/17/22 5:40 PM, Vlastimil Babka wrote:

On 6/8/22 14:23, Christoph Lameter wrote:

On Wed, 8 Jun 2022, Rongwei Wang wrote:

If available, I think document the issue and warn this incorrect behavior is
OK. But it still prints a large amount of confusing messages, and disturbs us?

Correct it would be great if you could fix this in a way that does not
impact performance.

are current operations on the slab being validated.

And I am trying to fix it in following way. In a short, these changes only
works under the slub debug mode, and not affects the normal mode (I'm not
sure). It looks not elegant enough. And if all approve of this way, I can
submit the next version.

Anyway, thanks for your time:).
-wrw

@@ -3304,7 +3300,7 @@ static void __slab_free(struct kmem_cache *s,

struct

slab *slab,

{
void *prior;
- int was_frozen;
+ int was_frozen, to_take_off = 0;
struct slab new;

to_take_off has the role of !n ? Why is that needed?

- do {
- if (unlikely(n)) {
+ spin_lock_irqsave(&n->list_lock, flags);
+ ret = free_debug_processing(s, slab, head, tail, cnt, addr);

Ok so the idea is to take the lock only if kmem_cache_debug. That looks
ok. But it still adds a number of new branches etc to the free loop.

Hi, Vlastimil, sorry for missing your message long time.

It also further complicates the already tricky code. I wonder if we should
make more benefit from the fact that for kmem_cache_debug() caches we don't
leave any slabs on percpu or percpu partial lists, and also in
free_debug_processing() we aready take both list_lock and slab_lock. If we
just did the freeing immediately there under those locks, we would be
protected against other freeing cpus by that list_lock and don't need the
double cmpxchg tricks.

enen, I'm not sure get your "don't need the double cmpxchg tricks" means completely. What you want to say is that replace cmpxchg_double_slab() here with following code when kmem_cache_debug(s)?

__slab_lock(slab);
if (slab->freelist == freelist_old && slab->counters == counters_old){
slab->freelist = freelist_new;
slab->counters = counters_new;
__slab_unlock(slab);
local_irq_restore(flags);
return true;
}
__slab_unlock(slab);

If I make mistakes for your words, please let me know.
Thanks!

What about against allocating cpus? More tricky as those will currently end
up privatizing the freelist via get_partial(), only to deactivate it again,
so our list_lock+slab_lock in freeing path would not protect in the
meanwhile. But the allocation is currently very inefficient for debug
caches, as in get_partial() it will take the list_lock to take the slab from
partial list and then in most cases again in deactivate_slab() to return it.

It seems that I need speed some time to eat these words. Anyway, thanks.

If instead the allocation path for kmem_cache_debug() cache would take a
single object from the partial list (not whole freelist) under list_lock, it
would be ultimately more efficient, and protect against freeing using
list_lock. Sounds like an idea worth trying to me?

Hyeonggon had a similar advice that split freeing and allocating slab from debugging, likes below:

__slab_alloc() {
if (kmem_cache_debug(s))
slab_alloc_debug()
else
___slab_alloc()
}

I guess that above code aims to solve your mentioned problem (idea)?

slab_free() {
if (kmem_cache_debug(s))
slab_free_debug()
else
__do_slab_free()
}

Currently, I only modify the code of freeing slab to fix the confusing messages of "slabinfo -v". If you agree, I can try to realize above mentioned slab_alloc_debug() code. Maybe it's also a challenge to me.

Thanks for your time.

And of course we would stop creating the 'validate' sysfs files for
non-debug caches.

Next message: Krzysztof Kozlowski: "Re: [PATCH v2 05/19] dt-bindings: power: mediatek: Refine multiple level power domain nodes"
Previous message: Pali Rohár: "Re: [PATCH] arm: ioremap: Fix compilation of code which use pci_remap_iospace() without CONFIG_MMU"
Next in thread: Vlastimil Babka: "Re: [PATCH 1/3] mm/slub: fix the race between validate_slab and slab_free"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]