Re: Kernel Panic with bonding + IPoIB on 3.2.9

From: Jay Vosburgh
Date: Tue Mar 20 2012 - 00:32:24 EST


Joseph Glanville <joseph.glanville@xxxxxxxxxxxxxx> wrote:

>On 20 March 2012 06:05, Roland Dreier <roland@xxxxxxxxxxxxxxx> wrote:
>> On Sun, Mar 18, 2012 at 1:21 PM, Joseph Glanville
>> <joseph.glanville@xxxxxxxxxxxxxx> wrote:
>>> [ Â422.047024] kernel BUG at net/core/dev.c:1896!
>>
>> So this line is
>>
>> Â Â Â ÂBUG_ON(offset >= skb_headlen(skb));
>>
>> right? ÂNo paritcular idea how we hit this, though...
>
>Yep... I have looked through most of /drivers/net/bonding and I can't
>really see why it should be blowing up there.. it really should cause
>the BUG_ON under normal IPoIB if the MTU was the cause - yet I have
>not experienced this.
>The bonding code doesn't seem to do anything special with the MTU
>other than propagating changes to the slaves.

For IPoIB, though, there is some extra initialization stuff in
bond_setup_by_slave(), and the hard_header_len will end up being set to
something different from the usual Ethernet value.

In looking at ipoib_setup, I see that hard_header_len appears to
be set to 4 (IPOIB_ENCAP_LEN). My recollection was that the IPoIB
hard_header_len was quite a bit larger than that; it looks like it
changed very recently from IPOIB_ENCAP_LEN + INFINIBAND_ALEN to what it
is now:

commit afd87adacb5de00768b2e54f0bd851278f2e6179
Author: Roland Dreier <roland@xxxxxxxxxxxxxxx>
Date: Tue Feb 7 14:51:21 2012 +0000

IPoIB: Stop lying about hard_header_len and use skb->cb to stash LL addresses

[ Upstream commit 936d7de3d736e0737542641269436f4b5968e9ef ]

Commit a0417fa3a18a ("net: Make qdisc_skb_cb upper size bound
explicit.") made it possible for a netdev driver to use skb->cb
between its header_ops.create method and its .ndo_start_xmit
method. Use this in ipoib_hard_header() to stash away the LL address
(GID + QPN), instead of the "ipoib_pseudoheader" hack. This allows
IPoIB to stop lying about its hard_header_len, which will let us fix
the L2 check for GRO.


I don't know if this change could be causing the problem (it
appears to be new in 3.2.9), but the hard_header_len is one of the few
areas in the TX path of bonding that IPoIB ends up being different from
regular Ethernet.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, fubar@xxxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/