Re: [PATCH v4] net/bonding: send arp in interval if no active slave

From: Jarod Wilson
Date: Mon Oct 12 2015 - 11:33:15 EST

Next message: Amanieu d'Antras: "[PATCH] signal: Make the si_code check in rt_[tg]sigqueueinfo stricter"
Previous message: Anand Moon: "Re: [PATCH 3/3] ARM: dts: exynos5422-odroidxu3: Added UHS-I bus speed support"
In reply to: Jay Vosburgh: "Re: [PATCH v4] net/bonding: send arp in interval if no active slave"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Jay Vosburgh wrote:

Jarod Wilson<jarod@xxxxxxxxxx> wrote:

Jarod Wilson wrote:
...

As Andy already stated I'm not a fan of such workarounds either but it's
necessary sometimes so if this is going to be actually considered then a
few things need to be fixed. Please make this a proper bonding option
which can be changed at runtime and not only via a module parameter.

Is there any particular userspace tool that would need some updating, or
is adding the sysfs knobs sufficient here? I think I've got all the sysfs
stuff thrown together now, but still need to test.

Most (all?) bonding options should be configurable via iproute
(netlink) now.

D'oh, of course. I've done the kernel-side netlink bits now too, and started looking at the iproute source. However...

Now, I saw that you've only tested with 500 ms, can't this be fixed by
using
a different interval ? This seems like a very specific problem to have a
whole new option for.

...I'll wait until we've heard confirmation from Uwe that intervals
other than 500ms don't fix things.

Okay, so I believe the "only tested with 500ms" was in reference to
testing with Uwe's initial patch. I do have supporting evidence in a
bugzilla report that shows upwards of 5000ms still experience the problem
here.

I did set up some switches and attempt to reproduce this
yesterday; I daisy-chained three switches (two Cisco and an HP) together
and connected the bonded interfaces to the "end" switches. I tried
various ARP targets (the switch, hosts on various points of the switch)
and varying arp_intervals and was unable to reproduce the problem.

As I understand it, the working theory is something like this:

- host with two bonded interfaces, A and B. For active-backup
mode, the interfaces have been assigned the same MAC address.

- switch has MAC for B in its forwarding table

- bonding goes from down to up, and thinks all its slaves are
down, and starts the "curr_arp_slave" search for an active
arp_ip_target. In this case, it starts with A, and sends an ARP from A.

As an aside, I'm not 100% clear on what exactly is going on in
the "bonding goes from down to up" transition; this seems to be key in
reproducing the issue.

- switch sees source mac coming from port A, starts to update
its forwarding table

- meanwhile, switch forwards ARP request, and receives ARP
reply, which it forwards to port B. Bonding drops this, as the slave is
inactive.

- switch finishes updating forwarding table, MAC is now assigned
to port A.

- bonding now tries sending on port B, and the cycle repeats.

If this is what's taking place, then the arp_interval itself is
irrelevant, the race is between the switch table update and the
generation of the ARP reply.

Also, presuming the above is what's going on, we could modify
the ARP "curr_arp_slave" logic a bit to resolve this without requiring
any magic knobs.

I really like this idea. Still trying to grasp exactly how we get into this situation and what everything looks like as we hop through the various bond_ab_arp_* functions though.

For example, we could change the "drop on inactive" logic to
recognise the "curr_arp_slave" search and accept the unicast ARP reply,
and perhaps make that receiving slave the next curr_arp_slave
automatically.

Nothing ever actually getting picked as curr_arp_slave does appear to be the problem, so that does sound like it could do the trick.

I also wonder if the fail_over_mac option would affect this
behavior, as it would cause the slaves to keep their MAC address for the
duration, so the switch would not see the MAC move from port to port.

Not sure if that's an option for the particular environment, but we could certainly ask Uwe to give it a try.

Another thought would be to have the curr_arp_slave cycle
through the slaves in random order, but that could create
non-deterministic results even when things are working correctly.

I'd say avoid this route if at all possible, would rather not make things less predictable.

--
Jarod Wilson
jarod@xxxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Amanieu d'Antras: "[PATCH] signal: Make the si_code check in rt_[tg]sigqueueinfo stricter"
Previous message: Anand Moon: "Re: [PATCH 3/3] ARM: dts: exynos5422-odroidxu3: Added UHS-I bus speed support"
In reply to: Jay Vosburgh: "Re: [PATCH v4] net/bonding: send arp in interval if no active slave"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]