Re: [RFC patch] Use IPI_shortcut for lapic timer broadcast

From: Ingo Molnar
Date: Mon Jun 29 2009 - 04:31:04 EST



* Luming Yu <luming.yu@xxxxxxxxx> wrote:

> On Mon, Jun 29, 2009 at 4:16 PM, Ingo Molnar<mingo@xxxxxxx> wrote:
> >
> > * Luming Yu <luming.yu@xxxxxxxxx> wrote:
> >
> >> On Mon, Jun 29, 2009 at 3:20 PM, Ingo Molnar<mingo@xxxxxxx> wrote:
> >> >
> >> > * Luming Yu <luming.yu@xxxxxxxxx> wrote:
> >> >
> >> >> Hello,
> >> >>
> >> >> We need to use IPI shortcut to send lapic timer broadcast
> >> >> to avoid the latency of sending IPI one bye one on systems with many
> >> >> logical processors when NO_HZ is disabled.
> >> >> Without this patch,I have seen upstream kernel with RHEL 5 kernel
> >> >> config boot hang .
> >> >
> >> > hm, that might be a valid optimization - but why does the lack of
> >> > this optimization result in a hang?
> >>
> >> It is hang caused by kernel code for work around lapic-timer-stop
> >> issue. With HZ=1000, and a lot of cpus (eg. 64 logical cpus), cpu
> >> 0 will be busy working on send TIMER IPI instead of making
> >> progress in boot (right after deep-C-state has been used).
> >
> > that's a bit weird. With HZ=1000 we have 1000 usecs between each
> > timer tick. Assuming a CPU sends to a lot of CPUs (64 logical CPUs)
> > that means that each IPI takes more than ~15 microseconds to
> > process. On what hardware/platform can this happen realistically?
>
> https://bugzilla.redhat.com/show_bug.cgi?id=499271
>
> Someone has measured that it needs 50-100us latency to send one
> IPI

Ugh. What platform is it that takes this much time to pass an IPI?

IPIs are the lifeline of process messaging under Linux. TLB flushes
in threaded apps rely on it (heavily), the scheduler relies on it
for wakeups (heavily) and a lot of other code relies on IPIs as
well.

Even a Pentium-5 100 MHz dual box was able to do cross-CPU IPIs
within 10-20 microseconds more than a decade ago - so 50-100 usecs
latency on a modern platform is totally out of this planet and will
hurt Linux performance big time. And the worst thing about it is
that none of the usual performance metrics will really show _why_
performance is tanking ...

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/