Re: [PATCH 3/7] idle, thermal, acpi: Remove home grown idle implementations

From: Arjan van de Ven
Date: Thu Nov 21 2013 - 14:45:26 EST


On 11/21/2013 11:19 AM, Paul E. McKenney wrote:
On Thu, Nov 21, 2013 at 08:21:03AM -0800, Arjan van de Ven wrote:
On 11/21/2013 8:07 AM, Paul E. McKenney wrote:
As long as RCU has some reliable way to identify an idle task, I am
good. But I have to ask -- why can't idle injection coordinate with
the existing idle tasks rather than temporarily making alternative
idle tasks?

it's not a real idle. that's the whole problem of the situation.
to the rest of the OS, this is being BUSY (busy saving power using
a CPU instruction, but it might as well have been an mdelay() operation)
and it's also what end users expect; they want to be able to see
where there performance (read: cpu time in "top") is going.

My concern is keeping RCU's books straight. Suppose that there is a need
to call for idle in the middle of a preemptible RCU read-side critical
section. Now, if that call for idle involves a context switch, all is
well -- RCU will see the task as still being in its RCU read-side critical
section, which means that it is OK for RCU to see the CPU as idle.

However, if there is no context switch and RCU sees the CPU as idle,
preemptible RCU could prematurely end the grace period. If there is no
context switch and RCU sees the CPU as non-idle for too long, we start
getting RCU CPU stall warning splats.

Another approach would be to only inject idle when the CPU is not
doing anything that could possibly be in an RCU read-side critical
section. But things might get a bit hot in case of an overly
long RCU read-side critical section.

One approach that might work would be to hook into RCU's context-switch
code going in and coming out, then telling RCU that the CPU is idle,
even though top and friends see it as non-idle. This last is in fact
similar to how RCU handles userspace execution for NO_HZ_FULL.


so powerclamp and such are not "idle".
They are "busy" from everything except the lowest level of the CPU hardware.
once you start thinking of them as idle, all hell breaks lose in terms of implications
(including sysadmin visibility etc).... (hence some of the explosions in this thread
as well).

but it's not "idle".

it's "put the cpu in a low power state for a specified amount of time". sure it uses the same
instruction to do so that the idle loop uses.

(now to make it messy, the current driver does a bunch of things similar to the idle loop
which is a mess and fair to be complained about)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/