Re: [RFC PATCH 0/4] timers: framework for migration between CPU

From: Balbir Singh
Date: Mon Feb 23 2009 - 06:25:20 EST


* Ingo Molnar <mingo@xxxxxxx> [2009-02-23 11:22:34]:

>
> * Balbir Singh <balbir@xxxxxxxxxxxxxxxxxx> wrote:
>
> > * Ingo Molnar <mingo@xxxxxxx> [2009-02-23 10:11:58]:
> >
> > >
> > > * Balbir Singh <balbir@xxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > > * Ingo Molnar <mingo@xxxxxxx> [2009-02-20 22:53:18]:
> > > >
> > > > >
> > > > > * Arjan van de Ven <arjan@xxxxxxxxxxxxx> wrote:
> > > > >
> > > > > > On Fri, 20 Feb 2009 17:07:37 +0100
> > > > > > Ingo Molnar <mingo@xxxxxxx> wrote:
> > > > > >
> > > > > > >
> > > > > > > * Vaidyanathan Srinivasan <svaidy@xxxxxxxxxxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > > > I'd also suggest to not do that rather ugly
> > > > > > > > > enable_timer_migration per-cpu variable, but simply reuse
> > > > > > > > > the existing nohz.load_balancer as a target CPU.
> > > > > > > >
> > > > > > > > This is a good idea to automatically bias the timers. But
> > > > > > > > this nohz.load_balancer is a very fast moving target and we
> > > > > > > > will need some heuristics to estimate overall system idleness
> > > > > > > > before moving the timers.
> > > > > > > >
> > > > > > > > I would agree that the power saving load balancer has a good
> > > > > > > > view of the system and can potentially guide the timer biasing
> > > > > > > > framework.
> > > > > > >
> > > > > > > Yeah, it's a fast moving target, but it already concentrates
> > > > > > > the load somewhat.
> > > > > > >
> > > > > >
> > > > > > I wonder if the real answer for this isn't to have timers be
> > > > > > considered schedulable-entities and have the regular scheduler
> > > > > > decide where they actually run.
> > > > >
> > > > > hm, not sure - it's a bit heavy for that.
> > > > >
> > > >
> > > > I think the basic timer migration policy should exist in user
> > > > space.
> > >
> > > I disagree.
> > >
> >
> > See below
> >
> > > > One of the ways of looking at it is, as we begin to
> > > > consolidate, using range timers and migrating all timers to
> > > > lesser number of CPUs would make a whole lot of sense.
> > > >
> > > > As far as the scheduler making those decisions is concerned,
> > > > my concern is that the load balancing is a continuous process
> > > > and timers don't necessarily work that way. I'd put my neck
> > > > out and say that irqbalance, range timers and timer migration
> > > > should all belong to user space. irqbalance and range timers
> > > > do, so should timer migration.
> > >
> > > As i said it my first reply, IRQ migration is special because
> > > they are not kernel-internal objects, they come externally so
> > > there's a lot of user-space enumeration, policy and other steps
> > > involved. Furthermore, IRQs are migrated in a 'slow' fashion.
> > >
> > > Timers on the other hand are fast entities tied to _tasks_
> > > primarily, not external entities.
> >
> > Timers are also queued due to external events like interrupts
> > (device drivers tend to set of timers all the time). [...]
>
> That is a silly argument. Tasks are created due to 'external
> events' as well such as the user hitting a key.
>

The point I was trying to make was that not all timers are due to
tasks, some are due to interrupts and thus the focus on looking at
getting irqbalance and timers to work together.

> What matters, and what was my argument is the distinction
> whether the kernel _generates_ the event. For most IRQ events it
> does not, for the overwhelming majority of timers events it
> consciously generates timer events. Which makes them all the
> much different.
>

Yes, agreed

> > [...] I am not fully against what you've said, at some
> > semantic level what you are suggesting is that at a higher
> > level of power saving, when the scheduler balances timers it
> > is doing a form of soft CPU hotplug on the system by migrating
> > timers and tasks away from idle CPUs when the load can be
> > handled by other CPUs. See below as well.
> >
> > > Hence they should migrate
> > > according to the CPU where the activities of the system
> > > concentrates - i.e. where tasks are running.
> > >
> > > Another thing: do you argue for the existing timer-migration
> > > code we have in mod_timer() to move to user-space too? It isnt a
> > > consistent argument to push 'some' of it to user-space, and some
> > > of it in kernel-space.
> > >
> >
> > No.. mod_timer() is correct where it belongs.
>
> You did not reply to my statement that the argument is a double
> standard. Why do certain migrations in the kernel and some not?

Sorry, I am not sure I understand what portions of mod_timer were you
recommending move to user space?

>
> > Consider the powertop usage scenario today
> >
> > 1. Powertop displays a list of timers and common causes of wakeup
> > 2. It recommends policies in user space that can affect power savings
> > a. usb autosuspend
> > b. wireless link management
> > c. disable HAL polling
>
> That's different - those are PowerTop timer event _reduction_
> policies. Not migration policies of existing timers.
>
> > My argument is, why can't we add
> >
> > d. Use range timers
> > e. Consolidate timers
> >
> > In the future.
> >
> > Even sched_mc=n is set by user space, so really the
> > policy is in user space.
>
> that is different again. sched_mc is a broad switch not a
> dynamic control like the sysfs migration interface that was
> introduced in this patchset. Which patchset we are discussing.
>

The timer migration patchset. We are discussing sched_mc=n, since I
expect sched_mc=3 or so to enable timer migration.

I guess we could try and select the target cpu for consolidation from
within the scheduler, but my concerns are

1. Not all timers are due to tasks
2. The effect of migrating a timer from the scheduler automatically
can vary, since we don't know the load associated with a timer.

But having said that some experimentation with Ingo's suggestion of
automatically selecting the target CPU would be nice.



--
Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/