Re: [PATCH v2 5/5] mm/slub: refactor deactivate_slab()

From: Hyeonggon Yoo
Date: Fri Mar 04 2022 - 23:29:26 EST


On Fri, Mar 04, 2022 at 08:01:20PM +0100, Vlastimil Babka wrote:
> On 3/4/22 07:34, Hyeonggon Yoo wrote:
> > Simplify deactivate_slab() by unlocking n->list_lock and retrying
> > cmpxchg_double() when cmpxchg_double() fails, and perform
> > add_{partial,full} only when it succeed.
> >
> > Releasing and taking n->list_lock again here is not harmful as SLUB
> > avoids deactivating slabs as much as possible.
> >
> > [ vbabka@xxxxxxx: perform add_{partial,full} when cmpxchg_double()
> > succeed. ]
> >
> > Signed-off-by: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx>
>
> Looks good, just noticed a tiny issue.
>
> > ---
> > mm/slub.c | 81 ++++++++++++++++++++++---------------------------------
> > 1 file changed, 32 insertions(+), 49 deletions(-)
> >
> > diff --git a/mm/slub.c b/mm/slub.c
> > index f9ae983a3dc6..c1a693ec5874 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -2344,8 +2344,8 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
> > {
> > enum slab_modes { M_NONE, M_PARTIAL, M_FULL, M_FREE };
> > struct kmem_cache_node *n = get_node(s, slab_nid(slab));
> > - int lock = 0, free_delta = 0;
> > - enum slab_modes l = M_NONE, m = M_NONE;
> > + int free_delta = 0;
> > + enum slab_modes mode = M_NONE;
> > void *nextfree, *freelist_iter, *freelist_tail;
> > int tail = DEACTIVATE_TO_HEAD;
> > unsigned long flags = 0;
> > @@ -2387,14 +2387,10 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
> > * Ensure that the slab is unfrozen while the list presence
> > * reflects the actual number of objects during unfreeze.
> > *
> > - * We setup the list membership and then perform a cmpxchg
> > - * with the count. If there is a mismatch then the slab
> > - * is not unfrozen but the slab is on the wrong list.
> > - *
> > - * Then we restart the process which may have to remove
> > - * the slab from the list that we just put it on again
> > - * because the number of objects in the slab may have
> > - * changed.
> > + * We first perform cmpxchg holding lock and insert to list
> > + * when it succeed. If there is mismatch then slub is not
> > + * unfrozen and number of objects in the slab may have changed.
> > + * Then release lock and retry cmpxchg again.
> > */
> > redo:
> >
> > @@ -2414,57 +2410,44 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
> > new.frozen = 0;
> >
> > if (!new.inuse && n->nr_partial >= s->min_partial)
> > - m = M_FREE;
> > + mode = M_FREE;
> > else if (new.freelist) {
> > - m = M_PARTIAL;
> > - if (!lock) {
> > - lock = 1;
> > - /*
> > - * Taking the spinlock removes the possibility that
> > - * acquire_slab() will see a slab that is frozen
> > - */
> > - spin_lock_irqsave(&n->list_lock, flags);
> > - }
> > - } else {
> > - m = M_FULL;
> > - if (kmem_cache_debug_flags(s, SLAB_STORE_USER) && !lock) {
>
> This used to set m = M_FULL; always.
>
> > - lock = 1;
> > - /*
> > - * This also ensures that the scanning of full
> > - * slabs from diagnostic functions will not see
> > - * any frozen slabs.
> > - */
> > - spin_lock_irqsave(&n->list_lock, flags);
> > - }
> > + mode = M_PARTIAL;
> > + /*
> > + * Taking the spinlock removes the possibility that
> > + * acquire_slab() will see a slab that is frozen
> > + */
> > + spin_lock_irqsave(&n->list_lock, flags);
> > + } else if (kmem_cache_debug_flags(s, SLAB_STORE_USER)) {
> > + mode = M_FULL;
>
> Now you only set it for SLAB_STORE_USER caches.
>
> > + /*
> > + * This also ensures that the scanning of full
> > + * slabs from diagnostic functions will not see
> > + * any frozen slabs.
> > + */
> > + spin_lock_irqsave(&n->list_lock, flags);
> > }
> >
> > - if (l != m) {
> > - if (l == M_PARTIAL)
> > - remove_partial(n, slab);
> > - else if (l == M_FULL)
> > - remove_full(s, n, slab);
> > -
> > - if (m == M_PARTIAL)
> > - add_partial(n, slab, tail);
> > - else if (m == M_FULL)
> > - add_full(s, n, slab);
> > - }
> >
> > - l = m;
> > if (!cmpxchg_double_slab(s, slab,
> > old.freelist, old.counters,
> > new.freelist, new.counters,
> > - "unfreezing slab"))
> > + "unfreezing slab")) {
> > + if (mode == M_PARTIAL || mode == M_FULL)
> > + spin_unlock_irqrestore(&n->list_lock, flags);
> > goto redo;
> > + }
> >
> > - if (lock)
> > - spin_unlock_irqrestore(&n->list_lock, flags);
> >
> > - if (m == M_PARTIAL)
> > + if (mode == M_PARTIAL) {
> > + add_partial(n, slab, tail);
> > + spin_unlock_irqrestore(&n->list_lock, flags);
> > stat(s, tail);
> > - else if (m == M_FULL)
> > + } else if (mode == M_FULL) {
> > + add_full(s, n, slab);
> > + spin_unlock_irqrestore(&n->list_lock, flags);
> > stat(s, DEACTIVATE_FULL);
>
> As a result, full slabs without SLAB_STORE_USER will not count
> DEACTIVATE_FULL anymore.

Oh, thank you for catching this.

We usually only deactivate full slabs for debugging to track all list of slabs.

But as you pointed, in rare case when pfmemalloc flag does not match, full slabs
can be deactivated (even if they are not put on list).

> I guess the easiest way to solve it is to e.g. add a M_FULL_NOLIST mode that
> only does the DEACTIVATE_FULL counting.

That will be enough solution.

Will fix this in v3. Thank you!

> > - else if (m == M_FREE) {
> > + } else if (mode == M_FREE) {
> > stat(s, DEACTIVATE_EMPTY);
> > discard_slab(s, slab);
> > stat(s, FREE_SLAB);
>

--
Thank you, You are awesome!
Hyeonggon :-)