[patch] severe softirq handling performance bug, fix, 2.4.5

From: Ingo Molnar (mingo@elte.hu)
Date: Sat May 26 2001 - 12:59:28 EST


i've been seeing really bad average TCP latencies on certain gigabit cards
(~300-400 microseconds instead of the expected 100-200 microseconds), ever
since softnet went into the main kernel, and never found a real
explanation for it, until today.

the problem always went away when i tried to use tcpdump or strace, so the
bug remained hidden and was hard to prove that it actually existed. (apart
from the bad lat_tcp numbers.) We found many related bugs, but this
problem remained. tcpdumps done on the network did not show any fault of
the TCP stack. The lat_tcp latencies fluctuated alot, but for certain
cards the latencies were stable, so i suspected some sort of hw problem.
The loopback networking device never showed these problems, which added to
the mystery.

the problem turned out to be a severe softirq handling bug in the x86
code.

background: soft interrupts were introduced as a generic kernel framework
around January 2000, as part of the softnet networking-rewrite, that
predated the final scalability rewrite of the Linux TCP/IP networking
code. Soft interrupts have unique semantics, they can be best described as
'IRQ-triggered atomic system calls'. (unlike bottom halves, soft-IRQs do
not preempt kernel code.)

soft-IRQs, like their name suggest, are used from device interrupts ('hard
interrupts') to trigger 'background' work related to interrupts. Soft-IRQs
are triggered per-CPU, and they are supposed to execute whenever nothing
else is done by the kernel on that particular CPU. Softirqs are executed
with interrupts enabled, so hard interrupts can re-enable them while they
are executing. do_softirq() is a kernel function that returns with IRQs
disabled and at this point it's guaranteed that there are no more pending
softirqs for this CPU.

this mechanizm was the intention, but not the reality. In two important
and frequently used code paths it was possible for an active soft-IRQ to
"go unnoticed": i measured as long as 140 milliseconds (!!!) latency
between softirq activation and softirq execution in certain cases. This is
obviously bad behavior.

the two error cases are:

 #1 hard-IRQ interrupts user-space code, activates softirq, and returns to
    user-space code

 #2 hard-IRQ interrupts the idle task, activates softirq and returns to
    the idle task.

category #1 is easy to fix, in entry.S we have to check active softirqs
not only the exception and ret-from-syscall cases, but also in the
IRQ-ret-to-userspace case.

category #2 is subtle, because the idle process is kernel code, so
returning to it we do not execute active softirqs. The main two types of
idle handlers both had a window do 'miss' softirq execution:

- the HLT-based default handler could be called after schedule()'s check
  for softirqs, but after enabling IRQs. In this case an interrupt handler
  has a window to activate a softirq and neither the IRQ return code, nor
  the idle loop would execute it immediately. The fix is to do a softirq
  check right before the safe_halt call.

- the idle-poll handler does not check for softirqs either, it now does
  this in every iteration.

with the attached softirq-2.4.5-A0 patch applied to vanilla 2.4.5, i see
picture-perfect lat_tcp latencies of 109 microseconds over real gigabit
network. I see very stable (and very good) TUX latencies as well. TCP
bandwidth got better as well, probably due to the caching-locality bonus
when executing softirqs right after hardirqs.

[I'd like to ask everyone who had TCP latency problems (or other
networking performance problems) to test 2.4.5 with this patch applied -
thanks!]

impact of the bug: all softirq-using code is affected, mostly networking.
The loopback net driver was not affected because it's not interrupt-based.
The bug went away due to strace or tcpdump because those two utilities
pumped system-calls into the system which 'fixed' the softirq handling
bug.

(other softirq-based code is the tasklet code, and the keyboard code is
using tasklets, so the keyboard code might be affected as well.)

        Ingo



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu May 31 2001 - 21:00:29 EST