Re: WARNING in ib_umad_kill_port

From: Dmitry Vyukov
Date: Tue Apr 07 2020 - 08:40:03 EST


On Tue, Apr 7, 2020 at 1:55 PM Jason Gunthorpe <jgg@xxxxxxxx> wrote:
>
> On Tue, Apr 07, 2020 at 11:56:30AM +0200, Dmitry Vyukov wrote:
> > > I'm not sure what could be done wrong here to elicit this:
> > >
> > > sysfs group 'power' not found for kobject 'umad1'
> > >
> > > ??
> > >
> > > I've seen another similar sysfs related trigger that we couldn't
> > > figure out.
> > >
> > > Hard to investigate without a reproducer.
> >
> > Based on all of the sysfs-related bugs I've seen, my bet would be on
> > some races. E.g. one thread registers devices, while another
> > unregisters these.
>
> I did check that the naming is ordered right, at least we won't be
> concurrently creating and destroying umadX sysfs of the same names.
>
> I'm also fairly sure we can't be destroying the parent at the same
> time as this child.
>
> Do you see the above commonly? Could it be some driver core thing? Or
> is it more likely something wrong in umad?

Mmmm... I can't say, I am looking at some bugs very briefly. I've
noticed that sysfs comes up periodically (or was it some other similar
fs?). General observation is that code frequently assumes only the
happy scenario and only, say, a single administrator doing one thing
at a time, slowly and carefully, and it is not really hardened against
armies of monkeys.
But I did not look at code abstractions, bug patterns, contracts, etc.

Greg KH may know better. Greg, as far as I remember you commented on
some of these reports along the lines of, for example, "the warning is
in sysfs code, but the bug is in the callers".