RE: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait ornmi
From: Thomas Gleixner
Date: Tue Jun 05 2012 - 15:44:09 EST
B1;2601;0cOn Tue, 5 Jun 2012, Peter Zijlstra wrote:
> On Tue, 2012-06-05 at 17:44 +0000, Luck, Tony wrote:
> > > Like what? Offline is nothing more than a C state on x86.
> >
> > Offline is a bigger hammer than idle.
> >
> > When a core is idle it may take an interrupt which wakes it up to use power.
> > The scheduler may assign a process to run on it, which will wake it up to use power.
> >
> > When a core is offline we take extra steps (re-routing interrupts, telling the
> > scheduler it is not available for work) to make sure it STAYS in that low
> > power state.
>
> You also wreck cpusets, cpu affinity and you need some userspace crap to
> poll state trying to figure out when to wake up again.
>
> (And yes, I've heard stories about userspace hotplug daemons that cause
> machine wakeups themselves and were a main source of power usage at some
> point).
>
> All the timer/interrupt nonsense needs to be fixed anyhow, the HPC and
> RT people want isolation anyway.
>
> So shouldn't we all start by fixing the entire
> load-balancer/timer/interrupt madness before we start swinging stupid
> big hammers around that break half the interfaces we have?
My idea of the stateful hotplug is to have a state which just gets rid
of the interrupts, timers and some other crap (mostly IPIs) but allows
an ad hoc resurrection of the cpu.
Ideally the state transition would be driven by the load-balancer.
I know that the current load balancer is too stupid to do that, but
that's a different problem. Right now we can't fix the load balancer
because we have no mechanisms to solve the other issues and the other
issues are not solved because the stupid load balancer is in the way.
So we have to start somewhere.
IMNSHO providing a stateful hotplug mechanism which allows us to solve
the issues outside of the load balancer in a simple and robust way is
a proper approach. Once we have that we can tackle the load balancer
to control the whole thing.
Vs. the interrupt/timer/other crap madness:
- We really don't want to have an interrupt balancer in the kernel
again, but we need a mechanism to prevent the user space balancer
trainwreck from ruining the power saving party.
- The timer issue is mostly solved by the existing nohz stuff
(plus/minus the few bugs in there).
- The other details (silly IPIs) and cross CPU timer arming) are way
easier to solve by a proper prohibitive state than by chasing that
nonsense all over the tree forever.
Thoughts ?
tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/