Re: SLUB: percpu partial object count is highly inaccurate, causing some memory wastage and maybe also worse tail latencies?

From: Vlastimil Babka
Date: Thu Jan 21 2021 - 12:22:35 EST


On 1/12/21 12:12 AM, Jann Horn wrote:
> At first I thought that this wasn't a significant issue because SLUB
> has a reclaim path that can trim the percpu partial lists; but as it
> turns out, that reclaim path is not actually wired up to the page
> allocator's reclaim logic. The SLUB reclaim stuff is only triggered by
> (very rare) subsystem-specific calls into SLUB for specific slabs and
> by sysfs entries. So in userland processes will OOM even if SLUB still
> has megabytes of entirely unused pages lying around.
>
> It might be a good idea to figure out whether it is possible to
> efficiently keep track of a more accurate count of the free objects on
> percpu partial lists; and if not, maybe change the accounting to
> explicitly track the number of partial pages, and use limits that are
> more appropriate for that? And perhaps the page allocator reclaim path
> should also occasionally rip unused pages out of the percpu partial
> lists?

I'm gonna send a RFC that adds a proper shrinker and thus connects this
shrinking to page reclaim, as a reply to this e-mail.