Re: [patch 0/5] optionally sync per-CPU vmstats counter on return to userspace

From: Frederic Weisbecker
Date: Tue Jul 06 2021 - 09:09:29 EST


On Fri, Jul 02, 2021 at 12:28:16PM -0300, Marcelo Tosatti wrote:
>
> Hi Frederic,
>
> On Fri, Jul 02, 2021 at 02:30:32PM +0200, Frederic Weisbecker wrote:
> > On Thu, Jul 01, 2021 at 06:03:36PM -0300, Marcelo Tosatti wrote:
> > > The logic to disable vmstat worker thread, when entering
> > > nohz full, does not cover all scenarios. For example, it is possible
> > > for the following to happen:
> > >
> > > 1) enter nohz_full, which calls refresh_cpu_vm_stats, syncing the stats.
> > > 2) app runs mlock, which increases counters for mlock'ed pages.
> > > 3) start -RT loop
> > >
> > > Since refresh_cpu_vm_stats from nohz_full logic can happen _before_
> > > the mlock, vmstat shepherd can restart vmstat worker thread on
> > > the CPU in question.
> > >
> > > To fix this, optionally sync the vmstat counters when returning
> > > from userspace, controllable by a new "vmstat_sync" isolcpus
> > > flags (default off).
> >
> > Wasn't the plan for such finegrained isolation features to do it at
> > the per task level using prctl()?
>
> Yes, but its orthogonal: when we integrate the finegrained isolation
> interface, will be able to use this code (to sync vmstat counters
> on return to userspace) only when userspace informs that it has entered
> isolated mode, so you don't incur the performance penalty of frequent
> vmstat counter writes when not using isolated apps.
>
> This is what the full task isolation task patchset mode is doing
> as well (CC'ing Alex BTW).

Right there can be two ways:

* A prctl request to sync vmstat only on exit from that prctl
* A prctl request to sync vmstat on all subsequent exit from
kernel space.

>
> This will require modifying applications (and the new kernel with the
> exposed interface).
>
> But there is demand for fixing this now, for currently existing
> binary only applications.

I would agree if it were a regression but it's not. It's merely
a new feature and we don't want to rush on a broken interface.

And I suspect some other people won't like much a new extension
to isolcpus.