Re: [PATCH v2] x86,mm: only trim the mm_cpumask once a second

From: Rik van Riel
Date: Tue Dec 03 2024 - 20:43:55 EST


On Tue, 2024-12-03 at 16:46 -0800, Dave Hansen wrote:
> On 12/3/24 12:07, Rik van Riel wrote:
> > The tlb_flush2 threaded test does not only madvise in a
> > loop, but also mmap and munmap from inside every thread.
> >
> > This should create massive contention on the mmap_lock,
> > resulting in threads going to sleep while waiting in
> > mmap and munmap.
> >
> > https://github.com/antonblanchard/will-it-scale/blob/master/tests/tlb_flush2.c
>
> Oh, wow, it only madvise()'s a 1MB allocation before doing the
> munmap()/mmap(). I somehow remembered it being a lot larger. And,
> yeah,
> I see a ton of idle time which would be 100% explained by mmap_lock
> contention.
>
> Did the original workload that you care about have idle time?
>
The workloads that I care about are things like memcache,
web servers, web proxies, and other workloads that typically
handle very short requests before going idle again.

These programs have a LOT of context switches to and from
the idle task.

> I'm wondering if trimming mm_cpumask() on the way to idle but leaving
> it
> alone on a context switch to another thread is a good idea.
>
The problem with that is that you then have to set the bit
again when switching back to the program, which creates
contention when a number of CPUs are transitioning to and
from idle at the same time.

Atomic operations on a contended cache line from the
context switch code end up being quite visible when
profiling some workloads :)

--
All Rights Reversed.