Re: RFC vmstat: On demand vmstat threads

From: Thomas Gleixner
Date: Wed Sep 18 2013 - 18:57:48 EST


On Wed, 18 Sep 2013, Andrew Morton wrote:
> On Tue, 10 Sep 2013 21:13:34 +0000 Christoph Lameter <cl@xxxxxxxxx> wrote:
> > + cpumask_copy(monitored_cpus, cpu_online_mask);
> > + cpumask_clear_cpu(tick_do_timer_cpu, monitored_cpus);
>
> What on earth are we using tick_do_timer_cpu for anyway?
> tick_do_timer_cpu is cheerfully undocumented, as is this code's use of
> it.

tick_do_timer_cpu is a timer core internal variable, which holds the
CPU NR which is responsible for calling do_timer(), i.e. the
timekeeping stuff. This variable has two functions:

1) Prevent a thundering herd issue of a gazillion of CPUs trying to
grab the timekeeping lock all at once. Only the CPU which is
assigned to do the update is handling it.

2) Hand off the duty in the NOHZ idle case by setting the value to
TICK_DO_TIMER_NONE, i.e. a non existing CPU. So the next cpu which
looks at it will take over and keep the time keeping alive.
The hand over procedure also covers cpu hotplug.

(Ab)Using it for anything else outside the timers core code is just
broken.

It's working for Christophs use case as his setup will not change the
assignment away from the boot cpu, but that's really not a brilliant
design to start with.

The vmstat accounting is not the only thing which we want to delegate
to dedicated core(s) for the full NOHZ mode.

So instead of playing broken games with explicitly not exposed core
code variables, we should implement a core code facility which is
aware of the NOHZ details and provides a sane way to delegate stuff to
a certain subset of CPUs.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/