Re: [PATCH 0/8] can: usb: remove all usb_set_intfdata(intf, NULL) in drivers' disconnect()

From: Vincent MAILHOL
Date: Thu Dec 08 2022 - 10:45:22 EST


On Thu. 8 Dec. 2022 at 20:04, Oliver Neukum <oneukum@xxxxxxxx> wrote:
> On 08.12.22 10:00, Vincent MAILHOL wrote:
> > On Mon. 5 Dec. 2022 at 17:39, Oliver Neukum <oneukum@xxxxxxxx> wrote:
> >> On 03.12.22 14:31, Vincent Mailhol wrote:
>
> Good Morning!

Good night! (different time zone :))

> > ACK, but I do not see the connection.
> Well, useless checks are bad. In particular, we should always
> make it clear whether a pointer may or may not be NULL.
> That is, I have no problem with what you were trying to do
> with your patch set. It is a good idea and possibly slightly
> overdue. The problem is the method.
>
> > I can see that cdc-acm sets acm->control and acm->data to NULL in his
> > disconnect(), but it doesn't set its own usb_interface to NULL.
>
> You don't have to, but you can. I was explaining the two patterns for doing so.
>
> >> which claim secondary interfaces disconnect() will be called a second time
> >> for.
> >
> > Are you saying that the disconnect() of those CAN USB drivers is being
> > called twice? I do not see this in the source code. The only caller of
> > usb_driver::disconnect() I can see is:
> >
> > https://elixir.bootlin.com/linux/v6.0/source/drivers/usb/core/driver.c#L458
>
> If they use usb_claim_interface(), yes it is called twice. Once per
> interface. That is in the case of ACM once for the originally probed
> interface and a second time for the claimed interface.
> But not necessarily in that order, as you can be kicked off an interface
> via sysfs. Yet you need to cease operations as soon as you are disconnected
> from any interface. That is annoying because it means you cannot use a
> refcount. From that stems the widespread use of intfdata as a flag.

Thank you for the details! I better understand this part now.

> >> In addition, a driver can use setting intfdata to NULL as a flag
> >> for disconnect() having proceeded to a point where certain things
> >> can no longer be safely done.
> >
> > Any reference that a driver can do that? This pattern seems racy.
>
> Technically that is exactly what drivers that use usb_claim_interface()
> do. You free everything at the first call and use intfdata as a flag
> to prevent a double free.
> The race is prevented by usbcore locking, which guarantees that probe()
> and disconnect() have mutual exclusion.
> If you use intfdata in sysfs, yes additional locking is needed.

ACK for the mutual exclusion. My question was about what you said in
your previous message:

| In addition, a driver can use setting intfdata to NULL as a flag
| for *disconnect() having proceeded to a point* where certain things
| can no longer be safely done.

How do you check that disconnect() has proceeded *to a given point*
using intf without being racy? You can check if it has already
completed once but not check how far it has proceeded, right?

> > What makes you assume that I didn't check this in the first place? Or
> > do you see something I missed?
>
> That you did not put it into the changelogs.
> That reads like the drivers are doing something obsolete or stupid.
> They do not. They copied something that is necessary only under
> some circumstances.
>
> And that you did not remove the checks.
>
> >> which is likely, then please also remove checks like this:
> >>
> >> struct ems_usb *dev = usb_get_intfdata(intf);
> >>
> >> usb_set_intfdata(intf, NULL);
> >>
> >> if (dev) {
>
> Here. If you have a driver that uses usb_claim_interface().
> You need this check or you unregister an already unregistered
> netdev.

Sorry, but with all my best intentions, I still do not get it. During
the second iteration, inft is NULL and:

/* equivalent to dev = intf->dev.data. Because intf is NULL,
* this is a NULL pointer dereference */
struct ems_usb *dev = usb_get_intfdata(intf);

/* OK, intf is already NULL */
usb_set_intfdata(intf, NULL);

/* follows a NULL pointer dereference so this is undefined
* behaviour */
if (dev) {

How is this a valid check that you entered the function for the second
time? If intf is the flag, you should check intf, not dev? Something
like this:

struct ems_usb *dev;

if (!intf)
return;

dev = usb_get_intfdata(intf);
/* ... */

I just can not see the connection between intf being NULL and the if
(dev) check. All I see is some undefined behaviour, sorry.

> The way this disconnect() method is coded is extremely defensive.
> Most drivers do not need this check. But it is never
> wrong in the strict sense.
>
> Hence doing a mass removal with a change log that does
> not say that this driver is using only a single interface
> hence the check can be dropped to reduce code size
> is not good.
>
> Regards
> Oliver