Re: [RFC PATCH 0/4] timers: framework for migration between CPU

From: Ingo Molnar
Date: Fri Feb 20 2009 - 08:22:40 EST



* Arun R Bharadwaj <arun@xxxxxxxxxxxxxxxxxx> wrote:

> Hi,
>
>
> In an SMP system, tasks are scheduled on different CPUs by the
> scheduler, interrupts are managed by irqbalancer daemon, but
> timers are still stuck to the CPUs that they have been
> initialised. Timers queued by tasks gets re-queued on the CPU
> where the task gets to run next, but timers from IRQ context
> like the ones in device drivers are still stuck on the CPU
> they were initialised. This framework will help move all
> 'movable timers' from one CPU to any other CPU of choice using
> a sysfs interface.

hm, the intention is good, the concept of migrating timers to
their target CPU is good as well. We already do some of that for
regular timers.

But the whole sysfs interface you implemented here is not
particularly clean nor is it efficient.

The main problem is that timers are really fast-moving entities,
and so are the tasks they are related to.

Your implementation completely ties the direction of migration
(the timer scheduling) to a clumsy sysfs interface:

+ if (sscanf(buf, "%d", &target_cpu) && cpu_online(target_cpu)) {
+ ret = count;
+ per_cpu(enable_timer_migration, cpu->sysdev.id) = target_cpu;
+ }

That doesnt really scale and i doubt it works in practice. We
should not schedule timers via sysfs, we should let the kernel
do it auomatically. [*]

So what i'd suggest instead is extend the scheduler power-saving
code, which already identifies a 'load balancer CPU', to also
attract all attractable sources of timers - automatically. See
the 'load_balancer' CPU logic in kernel/sched.c.

Does that sound OK to you? I think the end result might even
give better numbers - and out of box.

I'd also suggest to not do that rather ugly
enable_timer_migration per-cpu variable, but simply reuse the
existing nohz.load_balancer as a target CPU.

Also, please base your patches on the latest timer tree (which
already modified some of this code in this cycle):

http://people.redhat.com/mingo/tip.git/README

Btw., could you please also fix your mailer to not do this to
us:

Mail-Followup-To: linux-kernel@xxxxxxxxxxxxxxx,
linux-pm@xxxxxxxxxxxxxxxxxxxxxxxxxx, a.p.zijlstra@xxxxxxxxx,
ego@xxxxxxxxxx, tglx@xxxxxxxxxxxxx, mingo@xxxxxxx,
andi@xxxxxxxxxxxxxx, venkatesh.pallipadi@xxxxxxxxx,
vatsa@xxxxxxxxxxxxxxxxxx, arjan@xxxxxxxxxxxxx

it messes up the replies.

Ingo

[*] IRQ migration (where you possibly got the sysfs idea from)
is a special case where 'slow scheduling' via a user-space
daemon is possible: they are an external source of events
and they are concentrators of work. The same concept does
not apply to timers, most of which are inherently
task-generated.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/