Re: Linux-2.6.21 hangs during post boot initialization phase

From: Peter Williams
Date: Fri Apr 27 2007 - 10:29:04 EST


Neil Horman wrote:
On Fri, Apr 27, 2007 at 04:05:11PM +1000, Peter Williams wrote:
Linus Torvalds wrote:
On Fri, 27 Apr 2007, Peter Williams wrote:
The 2.6.21 kernel is hanging during the post boot phase where various daemons
are being started (not always the same daemon unfortunately).

This problem was not present in 2.6.21-rc7 and there is no oops or other
unusual output in the system log at the time the hang occurs.
Can you use "git bisect" to narrow it down a bit more? It's only 125 commits, so bisecting even just three or four kernels will narrow it down to a handful.
As the changes became, smaller the builds became quicker :-) and after 7 iterations we have:


author Neil Horman <nhorman@xxxxxxxxxxxxx>
Fri, 20 Apr 2007 13:54:58 +0000 (09:54 -0400)
committer Jeff Garzik <jeff@xxxxxxxxxx>
Tue, 24 Apr 2007 16:43:07 +0000 (12:43 -0400)
commit b748d9e3b80dc7e6ce6bf7399f57964b99a4104c
tree 887909e1f735bb444ef0e3e370f34401fa6eee02 tree | snapshot
parent d91c088b39e3c66d309938de858775bb90fd1ead commit | diff
sis900: Allocate rx replacement buffer before rx operation

The sis900 driver appears to have a bug in which the receive routine
passes the skbuff holding the received frame to the network stack before
refilling the buffer in the rx ring. If a new skbuff cannot be allocated, the
driver simply leaves a hole in the rx ring, which causes the driver to stop
receiving frames and become non-recoverable without an rmmod/insmod according to
reporters. This patch reverses that order, attempting to allocate a replacement
buffer first, and receiving the new frame only if one can be allocated. If no
skbuff can be allocated, the current skbuf in the rx ring is recycled, dropping
the current frame, but keeping the NIC operational.

Signed-off-by: Neil Horman <nhorman@xxxxxxxxxxxxx>
Signed-off-by: Jeff Garzik <jeff@xxxxxxxxxx>

Peter
--
Peter Williams pwil3058@xxxxxxxxxxxxxx

"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce

This was reported to me last night, and I've posted a patch to fix it, its
available here:
http://marc.info/?l=linux-netdev&m=117761259222165&w=2

It applies on top of the previous patch, and should fix your problem.

Here's a copy of the patch

Thanks & Regards
Neil


diff --git a/drivers/net/sis900.c b/drivers/net/sis900.c
index a6a0f09..7e44939 100644
--- a/drivers/net/sis900.c
+++ b/drivers/net/sis900.c
@@ -1754,6 +1754,7 @@ static int sis900_rx(struct net_device *net_dev)
sis_priv->rx_ring[entry].cmdsts = RX_BUF_SIZE;
} else {
struct sk_buff * skb;
+ struct sk_buff * rx_skb;
pci_unmap_single(sis_priv->pci_dev,
sis_priv->rx_ring[entry].bufptr, RX_BUF_SIZE,
@@ -1787,10 +1788,10 @@ static int sis900_rx(struct net_device *net_dev)
}
/* give the socket buffer to upper layers */
- skb = sis_priv->rx_skbuff[entry];
- skb_put(skb, rx_size);
- skb->protocol = eth_type_trans(skb, net_dev);
- netif_rx(skb);
+ rx_skb = sis_priv->rx_skbuff[entry];
+ skb_put(rx_skb, rx_size);
+ skb->protocol = eth_type_trans(rx_skb, net_dev);
+ netif_rx(rx_skb);
/* some network statistics */
if ((rx_status & BCAST) == MCAST)

This patch fixes the problem for me.

Peter
--
Peter Williams pwil3058@xxxxxxxxxxxxxx

"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/