Re: [bug?] r8169: hangs under heavy load

From: Eric Dumazet
Date: Fri Nov 25 2011 - 18:06:05 EST


Le vendredi 25 novembre 2011 Ã 23:22 +0100, Francois Romieu a Ãcrit :
> Eric Dumazet <eric.dumazet@xxxxxxxxx> :
> [...]
> > rtl8169_rx_interrupt(..., budget) can return budget + 1 sometimes
> > because of :
> >
> > /* Work around for AMD plateform. */
> > if ((desc->opts2 & cpu_to_le32(0xfffe000)) &&
> > (tp->mac_version == RTL_GIGA_MAC_VER_05)) {
> > desc->opts2 = 0;
> > cur_rx++;
> > }
>
> It needs fixing but RTL_GIGA_MAC_VER_05 is an old PCI 8169sc while
> debian's bug #642911 is about a 8168c (aka RTL_GIGA_MAC_VER_{19 .. 22}).
>
> This path is not used.
>

OK, then we receive a RxFIFOOver indication while napi handler is
running (quite possible if machine under network load)

This (hard) interrupt calls rtl8169_tx_timeout()
-> rtl8169_hw_reset()
-> rtl_hw_reset()
-> rtl8169_init_ring_indexes()

tp->dirty_tx = tp->dirty_rx = tp->cur_tx = tp->cur_rx = 0;

When control returns to softirq handler (rtl8169_rx_interrupt())
it can then catch tp->cur_rx being now 0 instead of value at start of
handler.

count = cur_rx - tp->cur_rx; // too big


Really, calling rtl8169_init_ring_indexes() from hardirq is killing us.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/