Re: [patch 0/5] optionally sync per-CPU vmstats counter on return to userspace

From: Marcelo Tosatti
Date: Tue Jul 06 2021 - 10:40:13 EST


On Tue, Jul 06, 2021 at 11:05:50AM -0300, Marcelo Tosatti wrote:
> On Tue, Jul 06, 2021 at 03:09:25PM +0200, Frederic Weisbecker wrote:
> > On Fri, Jul 02, 2021 at 12:28:16PM -0300, Marcelo Tosatti wrote:
> > >
> > > Hi Frederic,
> > >
> > > On Fri, Jul 02, 2021 at 02:30:32PM +0200, Frederic Weisbecker wrote:
> > > > On Thu, Jul 01, 2021 at 06:03:36PM -0300, Marcelo Tosatti wrote:
> > > > > The logic to disable vmstat worker thread, when entering
> > > > > nohz full, does not cover all scenarios. For example, it is possible
> > > > > for the following to happen:
> > > > >
> > > > > 1) enter nohz_full, which calls refresh_cpu_vm_stats, syncing the stats.
> > > > > 2) app runs mlock, which increases counters for mlock'ed pages.
> > > > > 3) start -RT loop
> > > > >
> > > > > Since refresh_cpu_vm_stats from nohz_full logic can happen _before_
> > > > > the mlock, vmstat shepherd can restart vmstat worker thread on
> > > > > the CPU in question.
> > > > >
> > > > > To fix this, optionally sync the vmstat counters when returning
> > > > > from userspace, controllable by a new "vmstat_sync" isolcpus
> > > > > flags (default off).
> > > >
> > > > Wasn't the plan for such finegrained isolation features to do it at
> > > > the per task level using prctl()?
> > >
> > > Yes, but its orthogonal: when we integrate the finegrained isolation
> > > interface, will be able to use this code (to sync vmstat counters
> > > on return to userspace) only when userspace informs that it has entered
> > > isolated mode, so you don't incur the performance penalty of frequent
> > > vmstat counter writes when not using isolated apps.
> > >
> > > This is what the full task isolation task patchset mode is doing
> > > as well (CC'ing Alex BTW).
> >
> > Right there can be two ways:
>
>
> * An isolcpus flag to request sync of vmstat on all exits
> to userspace.
> > * A prctl request to sync vmstat only on exit from that prctl
> > * A prctl request to sync vmstat on all subsequent exit from
> > kernel space.
>
> * A prctl to expose "vmstat is out of sync" information
> to userspace, so that it can be queried and flushed
> (Christoph's suggestion:
> https://www.spinics.net/lists/linux-mm/msg243788.html).
>
> > > This will require modifying applications (and the new kernel with the
> > > exposed interface).
> > >
> > > But there is demand for fixing this now, for currently existing
> > > binary only applications.
> >
> > I would agree if it were a regression but it's not. It's merely
> > a new feature and we don't want to rush on a broken interface.
>
> Well, people out there need it in some form (vmstat sync).
> Can we please agree on an acceptable way to allow this.
>
> Why its a broken interface? It has good qualities IMO:
>
> - Its well contained (if you don't need, don't use it).
> - Does not require modifying -RT applications.
> - Works well for a set of applications (where the overhead of
> syncing vmstat is largely irrelevant, but the vmstat_worker
> interruption is).
>
> And its patchset integrates part another piece of full task isolation.
>
> > And I suspect some other people won't like much a new extension
> > to isolcpus.
>
> Why is that so?

Ah, yes, that would be PeterZ.

IIRC his main point was that its not runtime changeable.
We can (partially fix that), if that is the case.

Peter, was that the only problem you saw with isolcpus interface?