Re: [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) again and shut down on idle)

From: Michal Hocko
Date: Thu Jan 28 2016 - 10:21:30 EST


On Wed 27-01-16 12:26:16, Christoph Lameter wrote:
> On Wed, 27 Jan 2016, Michal Hocko wrote:
>
> > This is caused by 0eb77e988032 ("vmstat: make vmstat_updater deferrable
> > again and shut down on idle") which has placed quiet_vmstat into
> > cpu_idle_loop. The main reason here seems to be that the idle entry has
> > to get over all zones and perform atomic operations for each vmstat
> > entry even though there might be no per cpu diffs. This is a pointless
> > overhead for _each_ idle entry.
> >
> > Make sure that quiet_vmstat is as light as possible.
> >
> > First of all it doesn't make any sense to do any local sync if the
> > current cpu is already set in oncpu_stat_off because vmstat_update puts
> > itself there only if there is nothing to do.
>
> But there might be something to do that happened in the meantime because
> counters were incremented. It needs to be checked. The shepherd can do
> that but it will delay the folding of diffs for awhile.

shepherd ticks with the same period so it should run relatively soon.
What is important here the test_and_set is the fast path which will
trigger a lot of the current CPU is mostly running in the userspace.

> > +void quiet_vmstat(void)
> > +{
> > + if (system_state != SYSTEM_RUNNING)
> > + return;
> > +
> > + /*
> > + * If we are already in hands of the shepherd then there
> > + * is nothing for us to do here.
> > + */
> > + if (cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off))
> > + return;
> > +
> > + if (!need_update(smp_processor_id()))
> > + return;
> > +
> > + /*
> > + * Just refresh counters and do not care about the pending delayed
> > + * vmstat_update. It doesn't fire that often to matter and canceling
> > + * it would be too expensive from this path.
> > + * vmstat_shepherd will take care about that for us.
> > + */
> > + refresh_cpu_vm_stats(false);
> > +}
>
> The problem here is that there will be an additional tick generated on
> idle. This is an issue for power because now the processor has to
> needlessly wake up again, do tick processing etc just to effectively do a
> cancel_delayed_work().
>
> The cancelling of the work request is required to avoid this additonal
> tick.

Yes but as mentioned in the changelog calling cancel_delayed_work is
quite expensive to be run on each idle entry slow path. In addition it
is RT unfriendly. So considering a single tick vs. other reasons I think
this is a reasonable compromise.

> > @@ -1470,11 +1487,14 @@ static void vmstat_shepherd(struct work_struct *w)
> > get_online_cpus();
> > /* Check processors whose vmstat worker threads have been disabled */
> > for_each_cpu(cpu, cpu_stat_off)
> > - if (need_update(cpu) &&
> > - cpumask_test_and_clear_cpu(cpu, cpu_stat_off))
> > -
> > - queue_delayed_work_on(cpu, vmstat_wq,
> > - &per_cpu(vmstat_work, cpu), 0);
> > + if (need_update(cpu)) {
> > + if (cpumask_test_and_clear_cpu(cpu, cpu_stat_off))
> > + queue_delayed_work_on(cpu, vmstat_wq,
> > + &per_cpu(vmstat_work, cpu), 0);
> > + } else {
> > + cpumask_set_cpu(cpu, cpu_stat_off);
>
> Umm the flag is already set right? This just causes a bouncing cacheline
> since we are accessing a global set of cpus,. Drop this line?

Ohh, right you are. I will remove this.

>
> > + cancel_delayed_work(this_cpu_ptr(&vmstat_work));
>
> Ok so this is to move the cancel into the shepherd.

Yeah, I have added a comment to explain that.

> Aside from the two issues mentioned this looks good.

Updated version:
---