Re: [PATCH] usb: cdc-wdm: close race between read and workqueue

From: Oliver Neukum
Date: Mon Apr 15 2024 - 07:09:14 EST




On 15.04.24 12:53, Bjørn Mork wrote:
Oliver Neukum <oneukum@xxxxxxxx> writes:
On 15.04.24 11:26, Bjørn Mork wrote:
Oliver Neukum <oneukum@xxxxxxxx> writes:
On 15.04.24 08:47, Bjørn Mork wrote:

urb from service_outstanding_interrupt(). That's why it was added. See
the explanation Robert wrote when introducing it:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/usb/class/cdc-wdm.c?id=c1da59dad0ebd3f9bd238f3fff82b1f7ffda7829

Well, the explanation is correct in that we must read
data available. However, if the RESPONDING flag is set
and the URB submitted, we are already doing so.
Sounds reasonable. Except that the bug proves we didn't.

Why? I am afraid I do not get that part.

I don't get how it happens either. But that's the only thing changed by
the patch.

Now you have lost me. I agree that this is the only thing that the patch
changes, but how do you derive the consequences from that?

> If you are right that service_outstanding_interrupt can race
againts
itself (and I don't doubt that), then I guess this could also happen
between failure to submit the URB and clearing the flag?

Yes, it can. In fact in this case the behavior should not change.
I am afraid we have a misunderstanding. It seems to me that in the
unchanged driver the result of service_outstanding_interrupt()
is undefined.
Please explain.

Sorry, I am so lost here that I am probably only confusing things. I doresp_count
not understand why we unlock &desc->iuspin around the usb_submit_urb
call. And git tells me I wrote that.

Dropping iuspin there allowed you to call usb_submit_urb() with GFP_KERNEL.
clear_wdm_read_flag(), as it then existed, could not race with itself because
its only caller wdm_read() is holding a mutex.

That, however, is not very material to the question at hand. iuspin at that
time protected only resp_count. Even today the URB itself is protected by
WDM_RESPONDING. (Which is why I think that test_and_set_bit is required)

Now, if we say that service_outstanding_interrupt() is racing with itself,
we have to ask why this is helpful. Do we at least agree that the regression
Aleksander is seeing is due to the removal of a race or are we looking at a side effect?

Regards
Oliver