Re: [patch V3 0/8] extensible prctl task isolation interface and vmstat sync

From: Marcelo Tosatti
Date: Wed Aug 25 2021 - 06:02:40 EST



+CC Thomas.

On Tue, Aug 24, 2021 at 12:24:23PM -0300, Marcelo Tosatti wrote:
>
> The logic to disable vmstat worker thread, when entering
> nohz full, does not cover all scenarios. For example, it is possible
> for the following to happen:
>
> 1) enter nohz_full, which calls refresh_cpu_vm_stats, syncing the stats.
> 2) app runs mlock, which increases counters for mlock'ed pages.
> 3) start -RT loop
>
> Since refresh_cpu_vm_stats from nohz_full logic can happen _before_
> the mlock, vmstat shepherd can restart vmstat worker thread on
> the CPU in question.
>
> To fix this, add task isolation prctl interface to quiesce
> deferred actions when returning to userspace.
>
> The patchset is based on ideas and code from the
> task isolation patchset from Alex Belits:
> https://lwn.net/Articles/816298/
>
> Please refer to Documentation/userspace-api/task_isolation.rst
> (patch 2) for details.
>
> Note: the prctl interface is independent of nohz_full=.
>
> ---------
>
> v3:
>
> - Split in smaller patches (Nitesh Lal).
> - Misc cleanups (Nitesh Lal).
> - Clarify nohz_full is not a dependency (Nicolas Saenz).
> - Incorrect values for prctl definitions (kernel robot).
> - Save configured state, so applications
> can activate externally configured
> task isolation parameters.
> - Remove "system default" notion (chisol should
> make it obsolete).
> - Update documentation: add new section with explanation
> about configuration/activation and code example.
> - Update samples.
> - Report configuration/activation state at
> /proc/pid/task_isolation.
> - Condense dirty information of per-CPU vmstats counters
> in a bool.
> - In-kernel KVM support.
> - Add support to configure inheritance on fork and exec.
>
> v2:
>
> - Finer-grained control of quiescing (Frederic Weisbecker / Nicolas Saenz).
>
> - Avoid potential regressions by allowing applications
> to use ISOL_F_QUIESCE_DEFMASK (whose default value
> is configurable in /sys/). (Nitesh Lal / Nicolas Saenz).
>
> v2 can be found at:
> https://lore.kernel.org/patchwork/project/lkml/list/?series=510225
>
>
> ---
>
> Documentation/userspace-api/task_isolation.rst | 281 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> arch/x86/kvm/x86.c | 3
> fs/proc/base.c | 68 +++++++++++++++++++
> include/linux/sched.h | 5 +
> include/linux/task_isolation.h | 131 ++++++++++++++++++++++++++++++++++++++
> include/linux/vmstat.h | 17 ++++
> include/uapi/linux/prctl.h | 27 +++++++
> init/init_task.c | 3
> kernel/Makefile | 2
> kernel/entry/common.c | 2
> kernel/exit.c | 2
> kernel/fork.c | 23 ++++++
> kernel/sys.c | 26 +++++++
> kernel/task_isolation.c | 315 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> mm/vmstat.c | 167 ++++++++++++++++++++++++++++++++++++------------
> samples/Kconfig | 7 ++
> samples/Makefile | 1
> samples/task_isolation/Makefile | 9 ++
> samples/task_isolation/task_isol.c | 83 ++++++++++++++++++++++++
> samples/task_isolation/task_isol.h | 9 ++
> samples/task_isolation/task_isol_userloop.c | 56 ++++++++++++++++
> 21 files changed, 1194 insertions(+), 43 deletions(-)
>
>