Re: [QUERY]: Is using CPU hotplug right for isolating CPUs?

From: Frederic Weisbecker
Date: Thu Jan 23 2014 - 09:58:54 EST


On Tue, Jan 21, 2014 at 04:03:53PM +0530, Viresh Kumar wrote:
> On 20 January 2014 21:21, Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote:
> > I fear you can't. If you schedule a timer in 4 seconds away and your clockdevice
> > can only count up to 2 seconds, you can't help much the interrupt in the middle to
> > cope with the overflow.
> >
> > So you need to act on the source of the timer:
> >
> > * identify what cause this timer
> > * try to turn that feature off
> > * if you can't then move the timer to the housekeeping CPU
>
> So, the main problem in my case was caused by this:
>
> <...>-2147 [001] d..2 302.573881: hrtimer_start:
> hrtimer=c172aa50 function=tick_sched_timer expires=602075000000
> softexpires=602075000000
>
> I have mentioned this earlier when I sent you attachments. I think
> this is somehow
> tied with the NO_HZ_FULL stuff? As the timer is queued for 300 seconds after
> current time.
>
> How to get this out?

So it's scheduled away 300 seconds later. It might be a pending timer_list. Enabling the
timer tracepoints may give you some clues.

>
> > I'll have a look into the latter point to affine global timers to the
> > housekeeping CPU. Per cpu timers need more inspection though. Either we rework
> > them to be possibly handled by remote/housekeeping CPUs, or we let the associate feature
> > to be turned off. All in one it's a case by case work.
>
> Which CPUs are housekeeping CPUs? How do we declare them?

It's not yet implemented, but it's an idea (partly from Thomas) of something we can do to
define some general policy on various periodic/async work affinity to enforce isolation.

The basic idea is to define the CPU handling the timekeeping duty to be the housekeeping
CPU. Given that CPU must keep a periodic tick, lets move all the unbound timers and
workqueues there. And also try to move some CPU affine work as well. For example
we could handle the scheduler tick of the full dynticks CPUs into that housekeeping
CPU, at a low freqency. This way we could remove that 1 second scheduler tick max deferment
per CPU. It may be an overkill though to run all the scheduler ticks on a single CPU so there
may be other ways to cope with that.

And I would like to keep that housekeeping notion flexible enough to be extendable on more
than one CPU, as I heard that some people plan to reserve one CPU per node on big
NUMA machines for such a purpose. So that could be a cpumask, augmented with an infrastructure.

Of course, if some people help contributing in this area, some things may eventually move foward
on the support of CPU isolation. I can't do that all alone, at least not quickly, given all the
things already pending in my queue (fix buggy nohz iowait accounting, support RCU full sysidle detection,
apply AMD range breakpoints patches, further cleanup posix cpu timers, etc...).

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/