Re: ksoftirqd uses 99% CPU triggered by network traffic (maybeRLT-8139 related)

From: Pasi Sjoholm
Date: Fri Jul 30 2004 - 13:42:46 EST


>> I don't remember if I have said this but when the ksoftirqd has started to
>> take all the cpu-time there is no way to stop it excluding booting
>> computer. You can kill or stop all the processes which are taking your
>> cpu-time (ie. source compiling) but network wont start to work or neither
>> there is no free cpu-time for use because ksoftirqd won't stop eating it.
> No you didn't then it seems you are hit by some bug. Have the bug happen.
> Kill userland processes like as you did. Try find out what's running. I would
> look in /proc/interrupt /proc/net/softnet_stat /proc/net/dev or maybe best to
> take a profile if possible or Magic SysRq to find out what's looping. Also try
> ifconfig down.

Ok, now I know how to go back in the normal state of kernel operation. I
just have to do "ifconfig eth0 down" and ksoftirqd stop taking all the
cpu-time, after that I can do "ifconfig eth0 up" and everything is working
fine.

So I guess it is 8139too-driver which has a bug.

First "SysRq : Show Regs" when the kernel is ok:

--
Pid: 0, comm: swapper
EIP: 0060:[<c0101ed3>] CPU: 0
EIP is at default_idle+0x23/0x30
EFLAGS: 00000246 Not tainted (2.6.7-mm7)
EAX: 00000000 EBX: c03cc000 ECX: 00000000 EDX: 0006a546
ESI: 00099100 EDI: c0427120 EBP: 0046e007 DS: 007b ES: 007b
CR0: 8005003b CR2: 08101000 CR3: 1e9b7000 CR4: 000006d0
[<c0101f3c>] cpu_idle+0x2c/0x40
[<c03ce832>] start_kernel+0x162/0x180
[<c03ce460>] unknown_bootoption+0x0/0x160
--

Second "SysRq : Show Regs" when the ksoftirqd has started to take all the
cpu-time:

--
Pid: 2, comm: ksoftirqd/0
EIP: 0060:[<e0871224>] CPU: 0
EIP is at rtl8139_poll+0xb4/0x100 [8139too]
EFLAGS: 00000247 Not tainted (2.6.7-mm7)
EAX: ffffe000 EBX: 00000040 ECX: dfa654f8 EDX: c0441978
ESI: dfa65400 EDI: e0868000 EBP: dff85f84 DS: 007b ES: 007b
CR0: 8005003b CR2: 080e9024 CR3: 1e9b7000 CR4: 000006d0
[<c01194b0>] ksoftirqd+0x0/0xc0
[<c02c5e3a>] net_rx_action+0x6a/0x110
[<c011917d>] __do_softirq+0x7d/0x80
[<c01191a6>] do_softirq+0x26/0x30
[<c0119518>] ksoftirqd+0x68/0xc0
[<c01276e5>] kthread+0xa5/0xb0
[<c0127640>] kthread+0x0/0xb0
[<c0102111>] kernel_thread_helper+0x5/0x14
--

Third one after I have done ifconfig eth0 down/up.

--
Pid: 0, comm: swapper
EIP: 0060:[<c0101ed3>] CPU: 0
EIP is at default_idle+0x23/0x30
EFLAGS: 00000246 Not tainted (2.6.7-mm7)
EAX: 00000000 EBX: c03cc000 ECX: 00000000 EDX: 000cf65e
ESI: 00099100 EDI: c0427120 EBP: 0046e007 DS: 007b ES: 007b
CR0: 8005003b CR2: b7c5a000 CR3: 1e9b7000 CR4: 000006d0
[<c0101f3c>] cpu_idle+0x2c/0x40
[<c03ce832>] start_kernel+0x162/0x180
[<c03ce460>] unknown_bootoption+0x0/0x160
--

I also compiled 8139too-driver with debub-options on and the logfile is
included as attachment.

Everything is going ok until end of the 20:41:29, then it happens:

--
20:41:29 kernel: rtl8139_rx: eth0: In rtl8139_rx(), current 004c BufAddr
0c80, free to 003c, Cmd 0c.
20:41:29 kernel: rtl8139_interrupt: eth0: exiting interrupt, intr_status=0x0001.
20:41:29 kernel: rtl8139_rx: eth0: In rtl8139_rx(), current 11bc BufAddr
22e0, free to 11ac, Cmd 0c.
20:41:29 kernel: rtl8139_interrupt: eth0: exiting interrupt, intr_status=0x0000.
20:41:29 last message repeated 4 times
20:41:29 kernel: rtl8139_rx: eth0: In rtl8139_rx(), current 2438 BufAddr
242c, free to 2428, Cmd 0d.
20:41:29 kernel: rtl8139_interrupt: eth0: exiting interrupt, intr_status=0x0010.
20:41:29 kernel: rtl8139_rx: eth0: In rtl8139_rx(), current 2438 BufAddr
242c, free to 2428, Cmd 0d.
20:41:29 kernel: rtl8139_interrupt: eth0: exiting interrupt, intr_status=0x0010.
20:41:29 kernel: rtl8139_rx: eth0: In rtl8139_rx(), current 2438 BufAddr
242c, free to 2428, Cmd 0d.
20:41:29 kernel: rtl8139_interrupt: eth0: exiting interrupt, intr_status=0x0010.
20:41:29 kernel: rtl8139_rx: eth0: In rtl8139_rx(), current 2438 BufAddr
242c, free to 2428, Cmd 0d.
--

The hardest part is to tell where the problem is but I think that
rtl8139_poll-function would be good place to start looking for the bug?

readprofile didn't tell much.. Where to go next? I'm not so good
programmer that I could find the right place to fix..

--
Pasi Sjöholm

Attachment: rtl8139-debug.gz
Description: GNU Zip compressed data