Re: [patch 1/5] sched: isolation: introduce quiesce_on_exit_to_usermode isolcpu flags
From: Frederic Weisbecker
Date: Mon Jul 19 2021 - 10:14:49 EST
On Wed, Jul 14, 2021 at 05:42:06PM -0300, Marcelo Tosatti wrote:
> Add a new isolcpus flag "quiesce_on_exit_to_usermode" to enable
> quiescing of deferred actions on return to userspace.
> Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx>
> Index: linux-2.6-vmstat-update/include/linux/sched/isolation.h
> --- linux-2.6-vmstat-update.orig/include/linux/sched/isolation.h
> +++ linux-2.6-vmstat-update/include/linux/sched/isolation.h
> Index: linux-2.6-vmstat-update/Documentation/admin-guide/kernel-parameters.txt
> --- linux-2.6-vmstat-update.orig/Documentation/admin-guide/kernel-parameters.txt
> +++ linux-2.6-vmstat-update/Documentation/admin-guide/kernel-parameters.txt
> @@ -2124,6 +2124,43 @@
> The format of <cpu-list> is described above.
> + quiesce_on_exit_to_usermode
> + This flag allows userspace to take preventive measures to
> + avoid deferred actions and create a OS noise free environment for
> + the application, by quiescing such activities on
> + return from syscalls (that is, perform the necessary
> + background work on return to userspace, rather than allowing
> + it to happen when userspace is executing, in the form of
> + an interruption to the application).
> + There might be a performance degradation from using this,
> + on systemcall heavy workloads, for the isolated CPUs.
> + This option is intended to be used by specialized workloads.
> + It should be deprecated in favour of a prctl() interface
> + to enable this mode (which allows the quiescing to take
> + place only on select sections of userspace execution, namely
> + the latency sensitive loops).
So I don't believe in that. If boot parameters were deprecatable, isolcpus would
have been removed already. And now that it's here we have to support it forever
and even fight for keeping it usable with modern interfaces like cpuset.
Besides, such (very costly) quiescence on kernel exit should be only useful on
specific sections of a workload. No need to kill the performance everywhere.
It's a new feature, not a fix, so let's introduce a proper prctl() interface
once and for all. We can't postpone that step forever.
> + Note: one of the preventive measures this option
> + enables is the following.
> + Page counters are maintained in per-CPU counters to
> + improve performance. When a CPU modifies a page counter,
> + this modification is kept in the per-CPU counter.
> + Certain activities require a global count, which
> + involves requesting each CPU to flush its local counters
> + to the global VM counters.
> + This flush is implemented via a workqueue item, which
> + requires scheduling the workqueue task on isolated CPUs.
> + To avoid this interruption, quiesce_on_exit_to_usermode
> + syncs the page counters on each return from system calls.
> + To ensure the application returns to userspace
> + with no modified per-CPU counters, its necessary to
> + use mlockall() in addition to this isolcpus flag.
> iucv= [HW,NET]
> ivrs_ioapic [HW,X86-64]