Re: [PATCH net] net: sysfs: Fix deadlock situation in sysfs accesses
From: Eric Dumazet
Date: Tue Dec 17 2024 - 07:04:51 EST
On Tue, Dec 17, 2024 at 12:56 PM Christophe Leroy
<christophe.leroy@xxxxxxxxxx> wrote:
>
>
>
> Le 17/12/2024 à 10:52, Eric Dumazet a écrit :
> > On Tue, Dec 17, 2024 at 10:41 AM Christophe Leroy
> > <christophe.leroy@xxxxxxxxxx> wrote:
> >>
> >>
> >>
> >> Le 17/12/2024 à 10:20, Eric Dumazet a écrit :
> >>> On Tue, Dec 17, 2024 at 9:59 AM Christophe Leroy
> >>> <christophe.leroy@xxxxxxxxxx> wrote:
> >>>>
> >>>>
> >>>>
> >>>> Le 17/12/2024 à 09:16, Eric Dumazet a écrit :
> >>>>> On Tue, Dec 17, 2024 at 8:18 AM Christophe Leroy
> >>>>> <christophe.leroy@xxxxxxxxxx> wrote:
> >>>>>>
> >>>>>> The following problem is encountered on kernel built with
> >>>>>> CONFIG_PREEMPT. An snmp daemon running with normal priority is
> >>>>>> regularly calling ioctl(SIOCGMIIPHY). Another process running with
> >>>>>> SCHED_FIFO policy is regularly reading /sys/class/net/eth0/carrier.
> >>>>>>
> >>>>>> After some random time, the snmp daemon gets preempted while holding
> >>>>>> the RTNL mutex then the high priority process is busy looping into
> >>>>>> carrier_show which bails out early due to a non-successfull
> >>>>>> rtnl_trylock() which implies restart_syscall(). Because the snmp
> >>>>>> daemon has a lower priority, it never gets the chances to release
> >>>>>> the RTNL mutex and the high-priority task continues to loop forever.
> >>>>>>
> >>>>>> Replace the trylock by lock_interruptible. This will increase the
> >>>>>> priority of the task holding the lock so that it can release it and
> >>>>>> allow the reader of /sys/class/net/eth0/carrier to actually perform
> >>>>>> its read.
> >>>>>>
> >>>>
> >>>> ...
> >>>>
> >>>>>>
> >>>>>> Fixes: 336ca57c3b4e ("net-sysfs: Use rtnl_trylock in sysfs methods.")
> >>>>>> Signed-off-by: Christophe Leroy <christophe.leroy@xxxxxxxxxx>
> >>>>>> ---
> >>>>>
> >>>>> At a first glance, this might resurface the deadlock issue Eric W. Biederman
> >>>>> was trying to fix in 336ca57c3b4e ("net-sysfs: Use rtnl_trylock in
> >>>>> sysfs methods.")
> >>>>
> >>>> Are you talking about the deadlock fixed (incompletely) by 5a5990d3090b
> >>>> ("net: Avoid race between network down and sysfs"), or the complement
> >>>> provided by 336ca57c3b4e ?
> >>>>
> >>>> My understanding is that mutex_lock() will return EINTR only if a signal
> >>>> is pending so there is no need to set signal_pending like it was when
> >>>> using mutex_trylock() which does nothing when the mutex is already locked.
> >>>>
> >>>> And an EINTR return is expected and documented for a read() or a
> >>>> write(), I can't see why we want ERESTARTNOINTR instead of ERSTARTSYS.
> >>>> Isn't it the responsibility of the user app to call again read or write
> >>>> if it has decided to not install the necessary sigaction for an
> >>>> automatic restart ?
> >>>>
> >>>> Do you think I should instead use rtnl_lock_killable() and return
> >>>> ERESTARTNOINTR in case of failure ? In that case, is it still possible
> >>>> to interrupt a blocked 'cat /sys/class/net/eth0/carrier' which CTRL+C ?
> >>>
> >>> Issue is when no signal is pending, we have a typical deadlock situation :
> >>>
> >>> One process A is :
> >>>
> >>> Holding sysfs lock, then attempts to grab rtnl.
> >>>
> >>> Another one (B) is :
> >>>
> >>> Holding rtnl, then attempts to grab sysfs lock.
> >>
> >> Ok, I see.
> >>
> >> But then what can be the solution to avoid busy looping with
> >> mutex_trylock , not giving any chance to the task holding the rtnl to
> >> run and unlock it ?
> >
> > One idea would be to add a usleep(500, 1000) if the sysfs read/write handler in
> > returns -ERESTARTNOINTR;
> >
> > Totally untested idea :
> >
> > diff --git a/fs/seq_file.c b/fs/seq_file.c
> > index 8bbb1ad46335c3b8f50dd35d552f86767e62ead1..276c6d594129a18a7a4c2b1df447b34993398ab4
> > 100644
> > --- a/fs/seq_file.c
> > +++ b/fs/seq_file.c
> > @@ -290,6 +290,8 @@ ssize_t seq_read_iter(struct kiocb *iocb, struct
> > iov_iter *iter)
> > m->read_pos += copied;
> > }
> > mutex_unlock(&m->lock);
> > + if (copied == -ERESTARTNOINTR)
> > + usleep_range(500, 1000);
> > return copied;
> > Enomem:
> > err = -ENOMEM;
>
> Ok, that may solve the issue, but it looks more like a hack than a real
> solution, doesn't it ?
> It doesn't guarantee that the task holding the RTNL lock will be given
> the floor to run and free the lock.
I am sure all for a real solution, what do you suggest ?
>
> The real issue is the nest between sysfs lock and RTNL lock. Can't we
> ensure that they are always held in the same order ?
Problem : adding/removing netdevices may add/remove sysfs files.
Adding/removing netdevices is done under rtnl.