Re: [ANNOUNCE] 3.0.14-rt31 - ksoftirq running wild - FEC ethernet driver to blame? Yep

From: Tim Sander
Date: Wed Jan 18 2012 - 06:17:43 EST


Hi Mike and others

Thanks for your reply Mike.

Am Dienstag, 17. Januar 2012, 18:40:11 schrieb Mike Galbraith:
> I have a patchlet lying about that will show the likely culprit, but if
> ksoftirqd is eating CPU, someone has to raising softirqs at a frightful
> rate, and the culprit it shows would almost certainly be ksoftirqd. I
> mean, what else is running during boot that is RT other than kernel
> threads. Nada.
Well thanks for your patch. It didn't apply cleanly due to some moved lines,
but nothing to serious. I now have a machine where top just shows me the
culprit:
sirq-net-tx/0

It seems to be triggered not as often as the mainline rt kernel though. But
after some starts and stops of "connmand" and "ifconfig eth0 down" i got back
this errornous behaviour. The only question is what next? Still i have some
more observations which might help to nail down this bug:
* ifconfig does not return when sirq-net-tx/0 eats all cpu
* sometimes sirq-net-tx/0 sits on the cpu for a couple of seconds and goes
away, somtimes it just stays there when "ifconfig eth0 up" is issued.
* There are suspicious "FEC: MDIO read timeout" kernel log messages from the
ethernet driver.
* The ethernet phy uses polling since i do not know how to set the phy irq in
the board definition. I tried using "phy_register_fixup_for_uid" and then
setting the phy_dev->irq int the fixup routine but that seems to be to late and
the interrupt is deregisterd but has not been registered when the network
device is shut down.
I also didn't found a example in the source and there has been no word in the
phy.txt documentation about it? So input on how to set the phy irq in the
board config of the pcm043 would be really nice.

> You can find out easy easy enough, just edit kernel/softirq.c, comment
> out ksoftirqd_set_sched_params() in run_ksoftirqd(). If the throttle
> doesn't kick in (because ksoftirqd is now not RT), box boots but
> ksoftirqd still chewing up a CPU, you have the same info the throttle
> hacklet would show.
>
> If that's it, you can apply the below, do the same edit, and see which
> thread is grinding away. From there, I'd set a trap. Let sirq threads
> detect that they are being awakened too fast (hey, I can't go to sleep,
> the sirq I just processed is busy again, N times in a row) and leave a
> note for wakeup_softirqd(). There, WARN_ON(ksoftirqd)[i].help_me) or
> such, to see who is flogging which softirq mercilessly.
I didn't use this tricks, since top was already doing its job good enough :-).

Best regards
Tim

Please ignore:


Hottinger Baldwin Messtechnik GmbH, Im Tiefen See 45, 64293 Darmstadt, Germany | www.hbm.com

Registered as GmbH (German limited liability corporation) in the commercial register at the local court of Darmstadt, HRB 1147
Company domiciled in Darmstadt | CEO: Andreas Huellhorst | Chairman of the board: James Charles Webster

Als Gesellschaft mit beschraenkter Haftung eingetragen im Handelsregister des Amtsgerichts Darmstadt unter HRB 1147
Sitz der Gesellschaft: Darmstadt | Geschaeftsfuehrung: Andreas Huellhorst | Aufsichtsratsvorsitzender: James Charles Webster

The information in this email is confidential. It is intended solely for the addressee. If you are not the intended recipient, please let me know and delete this email.

Die in dieser E-Mail enthaltene Information ist vertraulich und lediglich fÃr den Empfaenger bestimmt. Sollten Sie nicht der eigentliche Empfaenger sein, informieren Sie mich bitte kurz und loeschen diese E-Mail.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/