On 8/16/22 09:11, Jay Vosburgh wrote:
Jonathan Toppins <jtoppins@xxxxxxxxxx> wrote:
This is caused by the global variable ad_ticks_per_sec being zero as
demonstrated by the reproducer script discussed below. This causes
all timer values in __ad_timer_to_ticks to be zero, resulting
in the periodic timer to never fire.
To reproduce:
Run the script in
`tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh` which
puts bonding into a state where it never transmits LACPDUs.
line 44: ip link add fbond type bond mode 4 miimon 200 \
xmit_hash_policy 1 ad_actor_sys_prio 65535 lacp_rate fast
setting bond param: ad_actor_sys_prio
given:
params.ad_actor_system = 0
call stack:
bond_option_ad_actor_sys_prio()
-> bond_3ad_update_ad_actor_settings()
-> set ad.system.sys_priority = bond->params.ad_actor_sys_prio
-> ad.system.sys_mac_addr = bond->dev->dev_addr; because
params.ad_actor_system == 0
results:
ad.system.sys_mac_addr = bond->dev->dev_addr
line 48: ip link set fbond address 52:54:00:3B:7C:A6
setting bond MAC addr
call stack:
bond->dev->dev_addr = new_mac
line 52: ip link set fbond type bond ad_actor_sys_prio 65535
setting bond param: ad_actor_sys_prio
given:
params.ad_actor_system = 0
call stack:
bond_option_ad_actor_sys_prio()
-> bond_3ad_update_ad_actor_settings()
-> set ad.system.sys_priority = bond->params.ad_actor_sys_prio
-> ad.system.sys_mac_addr = bond->dev->dev_addr; because
params.ad_actor_system == 0
results:
ad.system.sys_mac_addr = bond->dev->dev_addr
line 60: ip link set veth1-bond down master fbond
given:
params.ad_actor_system = 0
params.mode = BOND_MODE_8023AD
ad.system.sys_mac_addr == bond->dev->dev_addr
call stack:
bond_enslave
-> bond_3ad_initialize(); because first slave
-> if ad.system.sys_mac_addr != bond->dev->dev_addr
return
results:
Nothing is run in bond_3ad_initialize() because dev_add equals
sys_mac_addr leaving the global ad_ticks_per_sec zero as it is
never initialized anywhere else.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jonathan Toppins <jtoppins@xxxxxxxxxx>
---
Notes:
v2:
* split this fix from the reproducer
v3:
* rebased to latest net/master
drivers/net/bonding/bond_3ad.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index d7fb33c078e8..957d30db6f95 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -84,7 +84,8 @@ enum ad_link_speed_type {
static const u8 null_mac_addr[ETH_ALEN + 2] __long_aligned = {
0, 0, 0, 0, 0, 0
};
-static u16 ad_ticks_per_sec;
+
+static u16 ad_ticks_per_sec = 1000 / AD_TIMER_INTERVAL;
static const int ad_delta_in_ticks = (AD_TIMER_INTERVAL * HZ) / 1000;
I still feel like this is kind of a hack, as it's not really
fixing bond_3ad_initialize to actually work (which is the real problem
as I understand it). If it's ok to skip all that for this case, then
why do we ever need to call bond_3ad_initialize?
The way it is currently written you still need to call bond_3ad_initialize() just not for setting the tick resolution. The issue here is ad_ticks_per_sec is used in several places to calculate timer periods, __ad_timer_to_ticks(), for various timers in the 802.3ad protocol. And if this variable, ad_ticks_per_sec, is left uninitialized all of these timer periods go to zero. Since the value passed in bond_3ad_initialize() is an immediate value I simply moved it off of the call stack and set the static global variable instead.
To fix bond_3ad_initialize(), probably something like the below is needed, but I do not understand why the guard if check was placed in bond_3ad_initialize().