Re: [RFC PATCH 00/15] Nohz task support

From: Frederic Weisbecker
Date: Mon Dec 20 2010 - 18:34:03 EST


On Mon, Dec 20, 2010 at 10:44:46AM -0500, Steven Rostedt wrote:
> On Mon, 2010-12-20 at 16:24 +0100, Frederic Weisbecker wrote:
> > The timer interrupt handles several things like preemption,
> > timekeeping, rcu, etc...
> >
> > However it appears that sometimes it is simply useless like
> > when a task runs alone and even more when it is in userspace
> > as RCU doesn't need it at all in such case.
> >
> > It appears that HPC workload would get some win of such timer
> > deactivation, and perhaps also the Real Time world as this
> > minimizes the critical sections due to way less interrupts to
> > handle.
> >
> > It works through the procfs interface:
> >
> > echo 1 > /proc/self/nohz
>
> I wounder if we could just have this happen automatically.

But this would add some global overhead, especially in the syscall
path as we need to take the slow path to hook userspace resume/exit.

> > - This must be written in /proc/self only, however further
> > plans to allow than to be set from another task should be
> > possible.
> >
> > You need to migrate irqs manually from userspace, same
> > for tasks. If a non nohz task is running on the same cpu
> > than a nohz task, the tick can't be stopped.
>
> So interrupts must not be set to this CPU?

No it's just that the point is to minimize interrupts. If you want
that on a cpu you can use a nohz task, but you still have do
migrate irqs in another CPU if you want to truly minimize
the interrupts on a nohz task.

> >
> > I can provide you the tools I'm using to test it if you
> > want.
> >
> > Note this depends on the rcu spurious softirq fixes in Paul's
> > queue for .38
> >
> > I'm also using a hack to make init affine to the first CPU
> > on boot so that all userspace tasks end up to the first CPU
> > except kernel threads and tasks that change their affinity
> > explicitly (this is not sched isolation). This avoids any
> > task to set up timers to random CPUs on which we'll later
> > want to run a nohz task. But probably this can be fixed
> > with another way, like unbinding these timers or so. This
> > probably require a detailed audit.
>
> Have you looked at "tuna"?

No, I'm discovering this, I'll have a look. I'm not sure this
can fix the randomly bound timer issue though.


> > Any comments are welcome.
>
> Now as I was saying. If only a single running task is on a given CPU,
> and it is affined there. If no timers are set for wakeups on that CPU.
> Could we possible set this to be NOHZ automatically?
>
> Just a thought.

So, we still need the syscalls slow path hooks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/