Re: Kernel crash after using new Intel NIC (igb)
From: Arun Sharma
Date: Thu May 26 2011 - 15:29:50 EST
On 5/24/11 11:35 PM, Eric Dumazet wrote:
Another possibility is to do the list_empty() check twice. Once without
taking the lock and again with the spinlock held.
Why ?
Part of the problem is that I don't have a precise understanding of the
race condition that's causing the list to become corrupted.
All I know is that doing it under the lock fixes it. If it's slowing
things down, we do a check outside the lock (since it's cheap). But if
we get the wrong answer, we verify it again under the lock.
list_del_init(&p->unused); (done under lock of course) is safe, you can
call it twice, no problem.
Doing it twice is not a problem. But doing it when we shouldn't be doing
it could be the problem.
The list modification under unused_peers.lock looks generally safe. But
the control flow (based on refcnt) done outside the lock might have races.
Eg: inet_putpeer() might find the refcnt go to zero, but before it adds
it to the unused list, another thread may be doing inet_getpeer() and
set refcnt to 1. In the end, we end up with a node that's potentially in
use, but ends up on the unused list.
-Arun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/