Re: ksoftirqd uses 99% CPU triggered by network traffic (maybeRLT-8139 related)

From: Pasi Sjoholm
Date: Fri Jul 30 2004 - 07:59:04 EST


On Fri, 30 Jul 2004, Robert Olsson wrote:

> You should monitor the the user app (gettimeofday()monitoring for starvation
> this is the most important measure and what we are trying to improve.

I can do that after couple of days.. I have to get married tomorrow and
spend some time with my wife. =)

> We can hardly expect softirq's alone to give us the balance of load we wish.
> At overload something has to get less resources. Even we defer all softirq's
> to scheduler context there is no way making any distinguish between them
> unless we run them in separate processes i.e one RX_SOFIRQ, TX_SOFIRQ etc.
> This could solve some problem I just discussed with Jamal where the RX
> softirq overruns the TX softirq and causes drop at egress (qdisc) when bus
> BW is saturated. Running softirq's under schedules context's can cause other
> delays and other problems.

Ok, I understand that you can't do 110% if you have only 100% so someone
has to wait. It would not matter if networks speed would slow down to
1/100 from the maximum speed if it would still somehow work and not
crashing the whole userspace.

I don't remember if I have said this but when the ksoftirqd has started to
take all the cpu-time there is no way to stop it excluding booting
computer. You can kill or stop all the processes which are taking your
cpu-time (ie. source compiling) but network wont start to work or neither
there is no free cpu-time for use because ksoftirqd won't stop eating it.

Actually, for now I would not care how much the kernel would slow down but
we have to get some stability. Restarting your computer everytime this
happens is not a solution.

> So most ksoftirq's runs most softirq's which is good. Without this you would
> not be able to type any commands at all. Also we see some effects from the
> path. Can you monitor userland starvation here too?

Sure..

> > - When the ksoftirqd starts to eat cpu-time time_squeeze-value (3rd
> > column) starts growing (in both cases it's same thing).
> This OK as we have to throttle.

Sure, but we should not crash the whole userspace. Why does kernel
suddenly think that it won't give any time for softirq's. Or it
does because I can write commands and etc. but the network won't work at all.

> > - Total-column's value stops growing although network file transfers
> > are still on. (1st column)
> Well ksoftirqd now runs RX softirq and competes heavily with other processes
> for your CPU you may have to adjust priorities to get your desired balance.
> Can you experiment a bit?

There is nothing to priorise after I have killed/quitted the jobs which we
taking all the cpu-time. Nothing more left than ksoftirqd and it will eat
all the cpu-time.

Of course I could try something like "nice make -j3" and see if it will
still do the same shit.

--
Pasi Sjöholm

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/