Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts

From: Michal Soltys
Date: Sun Aug 23 2009 - 13:44:18 EST


Jarek Poplawski wrote:
David Dillow wrote, On 08/22/2009 10:43 PM:

On Sat, 2009-08-22 at 05:07 -0700, Eric W. Biederman wrote:
ebiederm@xxxxxxxxxxxx (Eric W. Biederman) writes:

David Dillow <dave@xxxxxxxxxxxxxx> writes:

Re-looking at the code, I'd guess that some IRQ status line is getting
stuck high, but I don't see why -- we should acknowledge all outstanding
interrupts each time through the loop, whether we care about them or
not.

Could reproduce a problem with the following patch applied, and send the
full dmesg, please?
Here is what I get.

r8169 screaming irq status 00000085 mask 0000ffff event 0000803f napi 0000001d
And now that the machine has come out of it, that was followed by:
Looks like the soft lockup did not manage to trigger in this case.

I need some more context, please. What is the network load through this
NIC when you have the issues? Light, heavy? Can you give me more details
about the machine? A full dmesg from boot until this happens would help
quite a bit. At a minimum it would help answer which version of the chip
we're dealing with and what the machine it is in looks like.

Can you reproduce this with pci=nomsi? I'm assuming it the chip running
in MSI mode.

Also, can you reproduce it when booting UP (or maxcpus=1)? I'm thinking
about a race between rtl8169_interrupt() and rtl8169_poll(), but it
isn't jumping out at me.

Also, I'm having connectivity troubles this weekend, so my response may
be spotty. :(



BTW, FYI, it seems Michal stopped tracking this problem, but he
found this commit problematic as well.

From: Michal Soltys <soltys@xxxxxxxx>
Subject: Re: r8169 (+others ?) and note_interrupt performance hit on 2.6.30.x
Date: Wed, 05 Aug 2009 20:54:47 +0200
http://marc.info/?l=linux-netdev&m=124949848110710&w=2


Well - not really stopped, but not sure what to look at before that particular commit (as cpu load for the tests I've done increased rather significantly as well before that, and after 2.6.29 - but it doesn't seem to be related to the driver). And I was away for over a week...

As fot the changes that commit introduced, here's is link to the mail with the oprofile I did back then:

http://www.spinics.net/lists/netdev/msg102709.html

I'm happy to assist any way I can.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/