Re: [RFC PATCH 0/4] timers: framework for migration between CPU
From: Vaidyanathan Srinivasan
Date: Mon Feb 23 2009 - 05:37:09 EST
* Balbir Singh <balbir@xxxxxxxxxxxxxxxxxx> [2009-02-23 15:18:50]:
> * Ingo Molnar <mingo@xxxxxxx> [2009-02-23 10:11:58]:
>
> >
> > * Balbir Singh <balbir@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > > * Ingo Molnar <mingo@xxxxxxx> [2009-02-20 22:53:18]:
> > >
> > > >
> > > > * Arjan van de Ven <arjan@xxxxxxxxxxxxx> wrote:
> > > >
> > > > > On Fri, 20 Feb 2009 17:07:37 +0100
> > > > > Ingo Molnar <mingo@xxxxxxx> wrote:
> > > > >
> > > > > >
> > > > > > * Vaidyanathan Srinivasan <svaidy@xxxxxxxxxxxxxxxxxx> wrote:
> > > > > >
> > > > > > > > I'd also suggest to not do that rather ugly
> > > > > > > > enable_timer_migration per-cpu variable, but simply reuse
> > > > > > > > the existing nohz.load_balancer as a target CPU.
> > > > > > >
> > > > > > > This is a good idea to automatically bias the timers. But
> > > > > > > this nohz.load_balancer is a very fast moving target and we
> > > > > > > will need some heuristics to estimate overall system idleness
> > > > > > > before moving the timers.
> > > > > > >
> > > > > > > I would agree that the power saving load balancer has a good
> > > > > > > view of the system and can potentially guide the timer biasing
> > > > > > > framework.
> > > > > >
> > > > > > Yeah, it's a fast moving target, but it already concentrates
> > > > > > the load somewhat.
> > > > > >
> > > > >
> > > > > I wonder if the real answer for this isn't to have timers be
> > > > > considered schedulable-entities and have the regular scheduler
> > > > > decide where they actually run.
> > > >
> > > > hm, not sure - it's a bit heavy for that.
> > > >
> > >
> > > I think the basic timer migration policy should exist in user
> > > space.
> >
> > I disagree.
> >
>
> See below
>
> > > One of the ways of looking at it is, as we begin to
> > > consolidate, using range timers and migrating all timers to
> > > lesser number of CPUs would make a whole lot of sense.
> > >
> > > As far as the scheduler making those decisions is concerned,
> > > my concern is that the load balancing is a continuous process
> > > and timers don't necessarily work that way. I'd put my neck
> > > out and say that irqbalance, range timers and timer migration
> > > should all belong to user space. irqbalance and range timers
> > > do, so should timer migration.
> >
> > As i said it my first reply, IRQ migration is special because
> > they are not kernel-internal objects, they come externally so
> > there's a lot of user-space enumeration, policy and other steps
> > involved. Furthermore, IRQs are migrated in a 'slow' fashion.
> >
> > Timers on the other hand are fast entities tied to _tasks_
> > primarily, not external entities.
>
> Timers are also queued due to external events like interrupts (device
> drivers tend to set of timers all the time). I am not fully against
> what you've said, at some semantic level what you are suggesting is
> that at a higher level of power saving, when the scheduler balances
> timers it is doing a form of soft CPU hotplug on the system by
> migrating timers and tasks away from idle CPUs when the load can be
> handled by other CPUs. See below as well.
>
> > Hence they should migrate
> > according to the CPU where the activities of the system
> > concentrates - i.e. where tasks are running.
> >
> > Another thing: do you argue for the existing timer-migration
> > code we have in mod_timer() to move to user-space too? It isnt a
> > consistent argument to push 'some' of it to user-space, and some
> > of it in kernel-space.
> >
>
> No.. mod_timer() is correct where it belongs.
>
> Consider the powertop usage scenario today
>
> 1. Powertop displays a list of timers and common causes of wakeup
> 2. It recommends policies in user space that can affect power savings
> a. usb autosuspend
> b. wireless link management
> c. disable HAL polling
>
> My argument is, why can't we add
>
> d. Use range timers
> e. Consolidate timers
>
> In the future.
>
> Even sched_mc=n is set by user space, so really the
> policy is in user space.
Hi Balbir,
I would agree that the policy would exist in user space. But what
Ingo is suggesting is that the decision of actually choosing the
destination cpu to consolidate should come from existing scheduler's
power save balancer code.
My understanding is that we will certainly have a sysfs tunable to
'enable' timer migration or consolidation, similar to the sched_mc=2
policy, but the actual set of CPUs to evacuate and the correct set of
target CPUs to consolidate should come from the scheduler and not
necessarily from the user space.
The scheduler should be able to figure out the following parameters:
* Identify set of idle CPUs (CPU package) from which timers can be
removed
* Identify a semi-idle or idle CPU package to which the timers can be
moved
* Decide when to start moving timers as the system has large number of
idle CPUs
* Decide when to stop migrating as system becomes less idle and
utilisation increases
Guiding all of the above decisions from user space may not be fast
enough.
--Vaidy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/