Re: Bug related to bonding

From: Willy Tarreau
Date: Tue May 16 2006 - 00:53:04 EST


Hi,

On Mon, May 15, 2006 at 11:35:28AM +0200, Heiko Gerstung wrote:
> Hi!
>
> I am at a total loss with this one. My vanilla 2.4.32 crashes when I
> try to use bonding together with my rtl8150 based (USB-Ethernet) NICs.
> If this is a known error, I apologize for bothering the list and would
> appreciate any pointers to a working solution/workaround.
>
> Reproduce:
> # modprobe bonding mode=1 miimon=100 maxbonds=4
> # ifconfig bond0 172.16.10.111 netmask 255.255.255.0 up
> # ifenslave bond0 eth1 eth2
> Ethernet Channel Bonding Driver: v2.6.0 (January 14, 2004)
> bonding: MII link monitoring set to 100 ms
> 00:60:6E:30:07:Scheduling in interrupt
> kernel BUG at sched.c:564!
> invalid operand: 0000
> CPU: 0
> EIP: 0010:[<c011461d>] Not tainted
> EFLAGS: 00010282
> ...(following the CPU registers and Call Trace)....

It looks like what causes trouble is the link state monitoring. Please
try to disable 'miimon' just to ensure that it does not crash. If it
does not crash, a quick workaround would be to use another network
driver/card, but a good fix would be to send us the complete oops and
its decoding through ksymoops.

In the mean time, if you absolutely need to use this card and the link
monitoring, you might use the ARP monitoring instead. Just point it to
a valid IP on the same segment to detect its reachability through your
NICs.

> Please let me know which details I have to provide from the bug message
> (I have to type it in manually, no copy'n'paste possible:-)).

If you can use a serial console, it will help you. If you don't have
enough time to copy it by hand, boot with 'panic=180' to get 3 minutes
before the automatic reboot. It is very important to get the other
registers, and particularly the stack dump to know what function called
schedule().

> It is not clear to me whether this is a bug in the bonding module, in
> the network driver or in the kernel itself.

Possibly both. The bonding driver checks the link status regularly through
the use of MII ioctls. It's possible that the driver does nasty things during
this call.

> All 2.6.x kernels I tried worked fine, but I am currently bound to a
> 2.4.x kernel and all 2.4.x kernels I tried (2.4.20, 2.4.29) showed
> similiar problems when activating bonding.

That's interesting, I'll try to diff the bonding driver between 2.4 and
2.6. For info, I have multiple production machines running it on 2.4 with
e1000 and tg3 drivers which never had a single problem during years of
uptime.

> Thank you in advance,
> kind regards,
> Heiko

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/