Re: ksoftirqd uses 99% CPU triggered by network traffic (maybeRLT-8139 related)

From: Pasi Sjoholm
Date: Mon Jul 26 2004 - 19:18:07 EST


On Tue, 27 Jul 2004, Robert Olsson wrote:

>>> In summary: High softirq loads can totally kill userland. The reason is that
>>> do_softirq() is run from many places hard interrupts, local_bh_enable etc
>>> and bypasses the ksoftirqd protection. It just been discussed at OLS with
>>> Andrea and Dipankar and others. Current RCU suffers from this problem as well.
>> Ok, this explanation makes sense and my point of view I think this is
>> quite critical problem if you can "crash" linux kernel just sending enough
>> packets to network interface for an example.
>
> Well the packets also has to create hard softirq loads in practice this means route
> lookup or something else for normal traffic the RX_SOFIRQ is very well behaved
> and schedules itself to give other softirq's a chance to run also I'll guess you
> have softirq's not only from the network. If you decrease your load a bit you come
> to more nomal operation?

The system will be more responsive but won't ever come back to normal
operation and if I use my computer just for some chatting/writing/browsing
the ksoftirqd-problem won't ever happen.

It is also very hard to reproduce with only high network load. I ran some
test with 100Mbps load (RX and TX both) for 24hours without any problems
but soon as I started to compile some stuff or doing something
cpu-intensive the system would do crash-boom-bang.

>> I would be more than glad to help you in testing if you want to publish
>> some patches.
> Well maybe we should start to verify that this problem. Alexey did a litte program
> doing gettimeofday to run to see how long user mode could be starved. Also note we
> add new source of softirq's.

First run:

--
timestamp diff = 1, maxlat = 50963
timestamp diff = 0, maxlat = 50963
timestamp diff = 0, maxlat = 50963
timestamp diff = 1, maxlat = 50963
timestamp diff = 0, maxlat = 50963
timestamp diff = 0, maxlat = 50963
new maxlat = 124428
new maxlat = 511817
timestamp diff = 1, maxlat = 511817
new maxlat = 749913
new maxlat = 775142
**4581159
new maxlat = 4581159
**1704065
**1464153
timestamp diff = 1, maxlat = 4581159
**1448939
**1464013
**1021339
**1568588
**1861657
timestamp diff = 0, maxlat = 4581159
--

Second run:

--
timestamp diff = 0, maxlat = 832003
timestamp diff = 1, maxlat = 832003
timestamp diff = 0, maxlat = 832003
timestamp diff = 0, maxlat = 832003
timestamp diff = 0, maxlat = 832003
timestamp diff = 1, maxlat = 832003
**1793951
new maxlat = 1793951
timestamp diff = 0, maxlat = 1793951
**2579893
new maxlat = 2579893
timestamp diff = 0, maxlat = 2579893
timestamp diff = 0, maxlat = 2579893
timestamp diff = 0, maxlat = 2579893
**1004449
**3199625
new maxlat = 3199625
timestamp diff = 0, maxlat = 3199625
timestamp diff = 1, maxlat = 3199625
--

When the system has gone to this state you will have to wait quite a long
time to receive a new line.

--
Pasi Sjöholm

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/