Re: [patch 0/5] optionally sync per-CPU vmstats counter on return to userspace
From: Marcelo Tosatti
Date: Mon Jul 05 2021 - 10:46:14 EST
On Mon, Jul 05, 2021 at 04:26:48PM +0200, Christoph Lameter wrote:
> On Fri, 2 Jul 2021, Marcelo Tosatti wrote:
>
> > > > The logic to disable vmstat worker thread, when entering
> > > > nohz full, does not cover all scenarios. For example, it is possible
> > > > for the following to happen:
> > > >
> > > > 1) enter nohz_full, which calls refresh_cpu_vm_stats, syncing the stats.
> > > > 2) app runs mlock, which increases counters for mlock'ed pages.
> > > > 3) start -RT loop
> > > >
> > > > Since refresh_cpu_vm_stats from nohz_full logic can happen _before_
> > > > the mlock, vmstat shepherd can restart vmstat worker thread on
> > > > the CPU in question.
> > >
> > > Can we enter nohz_full after the app runs mlock?
> >
> > Hum, i don't think its a good idea to use that route, because
> > entering or exiting nohz_full depends on a number of variable
> > outside of one's control (and additional variables might be
> > added in the future).
>
> Then I do not see any need for this patch. Because after a certain time
> of inactivity (after the mlock) the system will enter nohz_full again.
> If userspace has no direct control over nohz_full and can only wait then
> it just has to do so.
Sorry, fail to see what you mean.
The problem (well its not a bug per se, but basically the current
disablement of vmstat_worker thread is not aggressive enough).
>From the initial message:
1) enter nohz_full, which calls refresh_cpu_vm_stats, syncing the stats.
2) app runs mlock, which increases counters for mlock'ed pages.
3) start -RT loop
Note that any activity that triggers stat counter changes (other than
mlock, it just happens that it was mlock in the test application i was
using, just replace with any other system call that triggers writes
to per-CPU vmstat counters), will cause this.
You said:
"Because after a certain time of inactivity (after the mlock) the
system will enter nohz_full again."
Yes, but we can't tolerate any activity from vmstat worker thread
on this particular CPU.
Do you want the app to wait for an event saying: "vmstat_worker is now
disabled, as long as you don't dirty vmstat counters, vmstat_shepherd
won't wake it up".
Rather than that, what this patch does is to sync the vmstat counters on
return to userspace, so that:
"We synced per-CPU vmstat counters to global counters, and disable
local-CPU vmstat worker (on return to userspace). As long as you
don't dirty vmstat counters, vmstat_shepherd won't wake it up".
Makes sense?
> > So preparing the system to function
> > while entering nohz_full at any location seems the sane thing to do.
> >
> > And that would be at return to userspace (since, if mlocked, after
> > that point there will be no more changes to propagate to vmstat
> > counters).
> >
> > Or am i missing something else you can think of ?
>
> I assumed that the "enter nohz full" was an action by the user
> space app because I saw some earlier patches to introduce such
> functionality in the past.
No, it meant "enter nohz full" (in the current Linux codebase, for
existing applications).