Re: [PATCH] 2.4.18 scheduler bugs

From: Ingo Molnar (mingo@elte.hu)
Date: Sat Mar 16 2002 - 04:23:21 EST


On Fri, 15 Mar 2002, Joe Korty wrote:

> >> It is an idle cpu that is spending those 200 cycles.
> >
> > wrong. When it's woken up it's *not* an idle CPU anymore, and it's the
> > freshly woken up task that is going to execute 200 cycles later...
>
> I have to disagree. It is the woken up task *running on the otherwise
> idle CPU* that burns up 200 cycles at the tail.

what do you disagree with? It's a fact that any overhead added to the
idle-wakeup path is not 'idle time' but adds latency (overhead) to the
freshly woken up task's runtime.

> A cpu is wasting, say, 5,000,000 cycles (1GHz/100/2, or 1/2 tick) in hlt
> when it could have been doing work. Why worry about an alternative
> wakeup path that burns up 200-400 cycles of that on the otherwise idling
> cpu, even if it is at the tail.

it's *not* idle time, it's naive to think that "it's in the idle task, so
it must be idle time". Latency added to the idle-wakeup shows up as direct
overhead in the woken up task. Lets look at an example, CPU0 is waking up
bdflush that will run on CPU1, CPU1 is idle currently:

         CPU0 CPU1
         [wakeup bdflush]
         [send IPI]
                    [... IPI delivery latency ...]
                                        [IRQ entry/exit]
                                        [idle thread context switches]
                                        [bdflush runs on CPU1]

contrasted with the idle=poll situation:

         CPU0 CPU1
         [wakeup bdflush]
         [set need_resched]
                                        [idle thread context switches]
                                        [bdflush runs on CPU1]

as you can see, the overhead of 'send IPI', 'IPI delivery' and 'IRQ
entry/exit' delays bdflush. Even assuming that sending and receiving an
IPI is as fast as setting & detecting need_resched [which it theoretically
can be], the IPI variant still has the cost of IRQ entry (and exit), which
is 200 cycles only optimistically, it's more like thousands of cycles on a
GHZ box.

[ as mentioned before, the default idle method has power saving advantages
(even if it's not HLT, some of the better methods do save considerable
amount of power), but idle=poll is clearly an option for the truly
performance-sensitive applications. ]

        Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Mar 23 2002 - 22:00:11 EST