RE: [PATCH] bridge: Fix a deadlock when enabling multicast snooping
From: Huang, Joseph
Date: Thu Dec 03 2020 - 17:21:27 EST
> From: Nikolay Aleksandrov <nikolay@xxxxxxxxxx>
> Sent: Thursday, December 3, 2020 3:47 PM
> To: Jakub Kicinski <kuba@xxxxxxxxxx>; Huang, Joseph
> <Joseph.Huang@xxxxxxxxxx>
> Cc: Roopa Prabhu <roopa@xxxxxxxxxx>; David S. Miller
> <davem@xxxxxxxxxxxxx>; bridge@xxxxxxxxxxxxxxxxxxxxxxxxxx;
> netdev@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Linus Lüssing
> <linus.luessing@xxxxxxxxx>
> Subject: Re: [PATCH] bridge: Fix a deadlock when enabling multicast
> snooping
>
> On 03/12/2020 20:28, Jakub Kicinski wrote:
> > On Tue, 1 Dec 2020 16:40:47 -0500 Joseph Huang wrote:
> >> When enabling multicast snooping, bridge module deadlocks on
> >> multicast_lock if 1) IPv6 is enabled, and 2) there is an existing
> >> querier on the same L2 network.
> >>
> >> The deadlock was caused by the following sequence: While holding the
> >> lock, br_multicast_open calls br_multicast_join_snoopers, which
> >> eventually causes IP stack to (attempt to) send out a Listener Report (in
> igmp6_join_group).
> >> Since the destination Ethernet address is a multicast address,
> >> br_dev_xmit feeds the packet back to the bridge via br_multicast_rcv,
> >> which in turn calls br_multicast_add_group, which then deadlocks on
> multicast_lock.
> >>
> >> The fix is to move the call br_multicast_join_snoopers outside of the
> >> critical section. This works since br_multicast_join_snoopers only
> >> deals with IP and does not modify any multicast data structures of
> >> the bridge, so there's no need to hold the lock.
> >>
> >> Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address")
> >>
> >> Signed-off-by: Joseph Huang <Joseph.Huang@xxxxxxxxxx>
> >
> > Nik, Linus - how does this one look?
> >
>
> Hi,
> Thanks, somehow I missed this one too. Need to check my email config. :) I
> believe I see how it can happen, although it's not straight-forward to follow.
> A selftest for this case would be great, and any traces (e.g. hung task) would
> help a lot as well.
> Correct me if I'm wrong but the sequence is something like:
> br_multicast_join_snoopers -> ipv6_dev_mc_inc -> __ipv6_dev_mc_inc ->
> igmp6_group_added
> -> MLDv1 (mode) igmp6_join_group() -> Again MLDv1 mode
> -> igmp6_join_group() -> igmp6_join_group
> -> igmp6_send() on the bridge device -> br_dev_xmit and onto the bridge
> -> mcast processing code
> which uses the multicast_lock spinlock. Right?
That is correct.
Here's a stack trace from a typical run:
echo -n 1 > /sys/devices/virtual/net/gmn0/bridge/multicast_snooping
[ 936.146754] rcu: INFO: rcu_preempt self-detected stall on CPU
[ 936.152534] rcu: 0-....: (5594 ticks this GP) idle=75a/1/0x4000000000000002 softirq=2787/2789 fqs=2625
[ 936.162026] (t=5253 jiffies g=4205 q=12)
[ 936.166041] Task dump for CPU 0:
[ 936.169272] sh R running task 0 1315 1295 0x00000002
[ 936.176332] Call trace:
[ 936.178797] dump_backtrace+0x0/0x140
[ 936.182469] show_stack+0x14/0x20
[ 936.185793] sched_show_task+0x108/0x138
[ 936.189727] dump_cpu_task+0x40/0x50
[ 936.193313] rcu_dump_cpu_stacks+0x94/0xd0
[ 936.197420] rcu_sched_clock_irq+0x75c/0x9c0
[ 936.201698] update_process_times+0x2c/0x68
[ 936.205893] tick_sched_handle.isra.0+0x30/0x50
[ 936.210432] tick_sched_timer+0x48/0x98
[ 936.214272] __hrtimer_run_queues+0x110/0x1b0
[ 936.218635] hrtimer_interrupt+0xe4/0x240
[ 936.222656] arch_timer_handler_phys+0x30/0x40
[ 936.227106] handle_percpu_devid_irq+0x80/0x140
[ 936.231654] generic_handle_irq+0x24/0x38
[ 936.235669] __handle_domain_irq+0x60/0xb8
[ 936.239774] gic_handle_irq+0x5c/0x148
[ 936.243535] el1_irq+0xb8/0x180
[ 936.246689] queued_spin_lock_slowpath+0x118/0x3b0
[ 936.251495] _raw_spin_lock+0x5c/0x68
[ 936.255221] br_multicast_add_group+0x40/0x170 [bridge]
[ 936.260491] br_multicast_rcv+0x7ac/0xe30 [bridge]
[ 936.265322] br_dev_xmit+0x140/0x368 [bridge]
[ 936.269689] dev_hard_start_xmit+0x94/0x158
[ 936.273876] __dev_queue_xmit+0x5ac/0x7f8
[ 936.277890] dev_queue_xmit+0x10/0x18
[ 936.281563] neigh_resolve_output+0xec/0x198
[ 936.285845] ip6_finish_output2+0x240/0x710
[ 936.290039] __ip6_finish_output+0x130/0x170
[ 936.294318] ip6_output+0x6c/0x1c8
[ 936.297731] NF_HOOK.constprop.0+0xd8/0xe8
[ 936.301834] igmp6_send+0x358/0x558
[ 936.305326] igmp6_join_group.part.0+0x30/0xf0
[ 936.309774] igmp6_group_added+0xfc/0x110
[ 936.313787] __ipv6_dev_mc_inc+0x1a4/0x290
[ 936.317885] ipv6_dev_mc_inc+0x10/0x18
[ 936.321677] br_multicast_open+0xbc/0x110 [bridge]
[ 936.326506] br_multicast_toggle+0xec/0x140 [bridge]
>
> One question - shouldn't leaving have the same problem? I.e.
> br_multicast_toggle -> br_multicast_leave_snoopers
> -> br_ip6_multicast_leave_snoopers -> ipv6_dev_mc_dec ->
> -> igmp6_group_dropped -> igmp6_leave_group ->
> MLDv1 mode && last reporter -> igmp6_send() ?
>
> I think it was saved by the fact that !br_opt_get(br,
> BROPT_MULTICAST_ENABLED) would be true and the multicast lock won't be
> acquired in the br_dev_xmit path? If so, I'd appreciate a comment about that
> because it's not really trivial to find out. :)
That's a really good point. Leave should have deadlocked as well, but when I
tested the patch, I was able to turn on/off multicast snooping multiple times
without any problem.
Is it because this line in igmp6_leave_group?
if (ma->mca_flags & MAF_LAST_REPORTER)
igmp6_send(&ma->mca_addr, ma->idev->dev,
ICMPV6_MGM_REDUCTION);
Perhaps MAF_LAST_REPORTER was not set, so igmp6_send was not called?
>
> Anyhow, the patch is fine as-is too:
> Acked-by: Nikolay Aleksandrov <nikolay@xxxxxxxxxx>
>
> Thanks,
> Nik
Thanks,
Joseph
________________________________
CONFIDENTIALITY NOTICE: This email and any attachments are for the sole use of the intended recipient(s) and contain information that may be Garmin confidential and/or Garmin legally privileged. If you have received this email in error, please notify the sender by reply email and delete the message. Any disclosure, copying, distribution or use of this communication (including attachments) by someone other than the intended recipient is prohibited. Thank you.