On Tue, 2012-01-17 at 23:15 +0000, Tina Yang wrote:Netdump does call it with interrupt disabled and hang because ofOn 1/17/2012 1:51 PM, Konrad Rzeszutek Wilk wrote:I didn't think start_xmit could be called with interrupts disabled orOn Tue, Jan 17, 2012 at 01:42:22PM -0800, Tina Yang wrote:Yes. It has nothing to do with net drivers but same causeOn 1/13/2012 3:06 AM, Ian Campbell wrote:Did you find the culprit of it? Was there a patch for that in the
Although netdump is now obsolete, I think it's always a good practice
to preserve caller's irq status as we had a very bad experience
chasing a similar problem caused by such a irq change in RDS
upstream kernel?
elsewhere in the kernel.
from interrupt context but perhaps I am wrong about that or perhaps
netconsole changes things?
Agreed. It's not safe to assume it unless the API documentation states that it can
Right, Documentation/networking/netdevices.txt states that start_xmit
can be called with interrupts disabled by netconsole and therefore using
the irqsave/restore locking in this function is, AFAICT, correct.
The question is not so much what may or may not be a problem in theShouldn't be the case now, but don't know about the future.in the not too long ago past.OK, it sounds like it was issues in the past but might not be the
case anymore.
Could please re-test it without that spinlock irqsave patch using
the upstream kernel (or just UEK2 since it is an 3.0 type kernel).
The fact is as long as there is a new caller that has the expectation
of preserved irq status, it would be a problem.
future but what the requirements of this function are, in particular
those imposed by the network stack for the start_xmit function.
I did look at the start_xmit of most of the net drivers myself when I hit theAs Ian said, some net drivers have been cautious in this regard already byI was talking about the interrupt/poll handler here since I hadn't yet
saving/restoring the status, but apparently not everyone.
noticed that the locking change was also in start_xmit and not just the
poll/interrupt paths (which was actually just code motion and not a
locking change in any case).
Ian.