[patch 0/5] optionally perform deferred actions on return to userspace (v3)
From: Marcelo Tosatti
Date: Wed Jul 14 2021 - 16:43:55 EST
Changelog:
-v3: use optimized percpu accessors for hotpath in
vmstat.c (Christoph Lameter)
fix !CONFIG_NUMA compilation breakage (kernel robot)
-v2: fix !CONFIG_SMP breakage (kernel robot)
switch option to generic "quiesce_on_exit_to_usermode"
Summary of what was discussed on -v1:
1) The additional hooks to performance sensitive callbacks
in mm/vmstat.c are protected by a static key, therefore
workloads which do not enable this should not be impacted.
2) People would prefer the prctl() interface, but as noted
in the option documentation (patch 1), the code added by
this patchset should be reused by the prctl() interface,
and the isolcpus option can then be deprecated.
3) Nobody has any other bright ideas for ways to solve this
that would make this patch series obsolete.
4) The isolcpus= interface should switch to a cpuset based
interface.
---
The logic to disable vmstat worker thread, when entering
nohz full, does not cover all scenarios. For example, it is possible
for the following to happen:
1) enter nohz_full, which calls refresh_cpu_vm_stats, syncing the stats.
2) app runs mlock, which increases counters for mlock'ed pages.
3) start -RT loop
Since refresh_cpu_vm_stats from nohz_full logic can happen _before_
the mlock, vmstat shepherd can restart vmstat worker thread on
the CPU in question.
To fix this, optionally quiesce deferred actions when returning
to userspace, controllable by a new "quiesce_on_exit_to_usermode"
isolcpus flag (default off).
See individual patches for details.