On Tue, Dec 12, 2017 at 5:21 AM, Qing Huang <qing.huang@xxxxxxxxxx> wrote:This seems to be an issue with the bonding driver. Also running older kernels on the same
Hi,[..]
We found an issue with the bonding driver when testing Mellanox devices.
The following test commands will stall the whole system sometimes, with
serial console
flooded with log messages from the bond_miimon_inspect() function. Setting
mtu size
to be 1500 seems okay but very rarely it may hit the same problem too.
ip address flush dev ens3f0
ip link set dev ens3f0 down
ip address flush dev ens3f1
ip link set dev ens3f1 down
[root@ca-hcl629 etc]# modprobe bonding mode=0 miimon=250 use_carrier=1
updelay=500 downdelay=500
[root@ca-hcl629 etc]# ifconfig bond0 up
[root@ca-hcl629 etc]# ifenslave bond0 ens3f0 ens3f1
[root@ca-hcl629 etc]# ip link set bond0 mtu 4500 up
Seiral console output:
** 4 printk messages dropped ** [ 3717.743761] bond0: link status down for
interface ens3f0, disabling it in 500 ms
It seems that when setting a large mtu size on an RoCE interface, the RTNLDid you try/managed to reproduce that also with other NIC drivers?
mutex may be held too long by the slave
interface, causing bond_mii_monitor() to be called repeatedly at an interval
of 1 tick (1K HZ kernel configuration) and kernel to become unresponsive.
Or.