Re: [PATCH] slub: limit count of partial slabs scanned to gather statistics
From: David Rientjes
Date: Thu May 07 2020 - 14:25:00 EST
On Thu, 7 May 2020, Konstantin Khlebnikov wrote:
> > > > To get exact count of free and used objects slub have to scan list of
> > > > partial slabs. This may take at long time. Scanning holds spinlock and
> > > > blocks allocations which move partial slabs to per-cpu lists and back.
> > > >
> > > > Example found in the wild:
> > > >
> > > > # cat /sys/kernel/slab/dentry/partial
> > > > 14478538 N0=7329569 N1=7148969
> > > > # time cat /sys/kernel/slab/dentry/objects
> > > > 286225471 N0=136967768 N1=149257703
> > > >
> > > > real 0m1.722s
> > > > user 0m0.001s
> > > > sys 0m1.721s
> > > >
> > > > The same problem in slab was addressed in commit f728b0a5d72a ("mm,
> > > > slab:
> > > > faster active and free stats") by adding more kmem cache statistics.
> > > > For slub same approach requires atomic op on fast path when object
> > > > frees.
> > > >
> > > > Let's simply limit count of scanned slabs and print warning.
> > > > Limit set in /sys/module/slub/parameters/max_partial_to_count.
> > > > Default is 10000 which should be enough for most sane cases.
> > > >
> > > > Return linear approximation if list of partials is longer than limit.
> > > > Nobody should notice difference.
> > > >
> > > > Signed-off-by: Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx>
> > >
> > > This patch will trigger the warning under memory pressure, and then makes
> > > lockdep unhappy. Also, it is almost impossible tell how many
> > > max_partial_to_count is sufficient from user perspective.
>
> Oops, my bad. Printing under this lock indeed a bad idea.
>
> Probably it's better to simply remove this message.
> I cannot imagine situation when precise count of object matters at such scale.
>
If the printk is removed, then probably better to remove the
max_partial_to_count param as well? I doubt it would ever be used since
nothing points to it other than the kernel code now. If somebody
complains about the approximation, we can (a) convince them the
approximation is better than precise calculation to prevent irqs from
being disabled for several seconds and (b) add it later if absolutely
necessary. I notice the absence of other module_param()'s in mm/slub.c,
so likely better to avoid adding special tunables like this unless
absolutely necessary.