Re: [ANNOUNCE] 3.0.14-rt31 - ksoftirq running wild - FEC ethernet driver to blame? Yep

From: Tim Sander
Date: Tue Jan 24 2012 - 05:53:13 EST


Hi

I have some more info on the error report, please see below.
> Well thanks for your patch. It didn't apply cleanly due to some moved
> lines, but nothing to serious. I now have a machine where top just shows
> me the culprit:
> sirq-net-tx/0
>
> It seems to be triggered not as often as the mainline rt kernel though. But
> after some starts and stops of "connmand" and "ifconfig eth0 down" i got
> back this errornous behaviour. The only question is what next? Still i
> have some more observations which might help to nail down this bug:
> * ifconfig does not return when sirq-net-tx/0 eats all cpu
> * sometimes sirq-net-tx/0 sits on the cpu for a couple of seconds and goes
> away, somtimes it just stays there when "ifconfig eth0 up" is issued.
> * There are suspicious "FEC: MDIO read timeout" kernel log messages from
> the ethernet driver.
> * The ethernet phy uses polling since i do not know how to set the phy irq
> in the board definition. I tried using "phy_register_fixup_for_uid" and
> then setting the phy_dev->irq int the fixup routine but that seems to be
> to late and the interrupt is deregisterd but has not been registered when
> the network device is shut down.
> I also didn't found a example in the source and there has been no word in
> the phy.txt documentation about it? So input on how to set the phy irq in
> the board config of the pcm043 would be really nice.
>
I would like to point out that the running wild softirq seems to bee fixed by
9ec14c04ec6be93ff397adf250bc91ee77742bfb of the stable git tree.
At least i could not reproduce the ksoftirq error with 3.0.17-rt33. With the
above commit reverted i could reproduce this error. As this running wild
ksoftirq did not occur regularly there is a small chance that this error just
didn't show up with the new kernel but i doubt it.

There is still the case that on network configuration the systemload with
ksoftirq goes suspicoiusly high but at least it gets to normal levels
afterwards.

Best regards
Tim

PS: I have added the cc list of that commit to this report. If you guys have
some insight on this you could share that would be great.

Please ignore:

Hottinger Baldwin Messtechnik GmbH, Im Tiefen See 45, 64293 Darmstadt, Germany | www.hbm.com

Registered as GmbH (German limited liability corporation) in the commercial register at the local court of Darmstadt, HRB 1147
Company domiciled in Darmstadt | CEO: Andreas Huellhorst | Chairman of the board: James Charles Webster

Als Gesellschaft mit beschraenkter Haftung eingetragen im Handelsregister des Amtsgerichts Darmstadt unter HRB 1147
Sitz der Gesellschaft: Darmstadt | Geschaeftsfuehrung: Andreas Huellhorst | Aufsichtsratsvorsitzender: James Charles Webster

The information in this email is confidential. It is intended solely for the addressee. If you are not the intended recipient, please let me know and delete this email.

Die in dieser E-Mail enthaltene Information ist vertraulich und lediglich fÃr den Empfaenger bestimmt. Sollten Sie nicht der eigentliche Empfaenger sein, informieren Sie mich bitte kurz und loeschen diese E-Mail.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/