Re: net: deadlock between ip_expire/sch_direct_xmit

From: Eric Dumazet
Date: Mon Mar 20 2017 - 08:43:28 EST


On Mon, 2017-03-20 at 10:59 +0100, Dmitry Vyukov wrote:
> On Tue, Mar 14, 2017 at 5:41 PM, Cong Wang <xiyou.wangcong@xxxxxxxxx> wrote:
> > On Tue, Mar 14, 2017 at 7:56 AM, Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
> >> On Tue, Mar 14, 2017 at 7:46 AM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> >>
> >>> I am confused. Lockdep has observed both of these stacks:
> >>>
> >>> CPU0 CPU1
> >>> ---- ----
> >>> lock(&(&q->lock)->rlock);
> >>> lock(_xmit_ETHER#2);
> >>> lock(&(&q->lock)->rlock);
> >>> lock(_xmit_ETHER#2);
> >>>
> >>>
> >>> So it somehow happened. Or what do you mean?
> >>>
> >>
> >> Lockdep said " possible circular locking dependency detected " .
> >> It is not an actual deadlock, but lockdep machinery firing.
> >>
> >> For a dead lock to happen, this would require that he ICMP message
> >> sent by ip_expire() is itself fragmented and reassembled.
> >> This cannot be, because ICMP messages are not candidates for
> >> fragmentation, but lockdep can not know that of course...
> >
> > It doesn't have to be ICMP, as long as get the same hash for
> > the inet_frag_queue, we will need to take the same lock and
> > deadlock will happen.
> >
> > hash = ipqhashfn(iph->id, iph->saddr, iph->daddr, iph->protocol);
> >
> > So it is really up to this hash function.
>
>
>
> Is the following the same issue?
> It mentions dev->qdisc_tx_busylock, but I am not sure if it's relevant
> if there already a cycle between _xmit_ETHER#2 -->
> &(&q->lock)->rlock#2.


False positive again.

veth needs to use netdev_lockdep_set_classes(), assuming you use veth ?

I will provide a patch, thanks.

cf515802043cccecfe9ab75065f8fc71e6ec9bab missed a few drivers.