Re: [PATCH 2/5] usb: gadget: f_midi: added spinlock on transmit function

From: Felipe Ferreri Tonello
Date: Tue Mar 08 2016 - 08:45:14 EST


Hi Balbi,

On 08/03/16 07:37, Felipe Balbi wrote:
>
> Hi,
>
> Felipe Ferreri Tonello <eu@xxxxxxxxxxxxxxxxx> writes:
>>>>>> Since f_midi_transmit is called by both ALSA and USB frameworks, it
>>>>> can
>>>>>> potentially cause a race condition between both calls. This is bad
>>>>> because the
>>>>>> way f_midi_transmit is implemented can't handle concurrent calls.
>>>>> This is due
>>>>>> to the fact that the usb request fifo looks for the next element and
>>>>> only if
>>>>>> it has data to process it enqueues the request, otherwise re-uses it.
>>>>> If both
>>>>>> (ALSA and USB) frameworks calls this function at the same time, the
>>>>>> kfifo_seek() will return the same usb_request, which will cause a
>>>>> race
>>>>>> condition.
>>>>>>
>>>>>> To solve this problem a syncronization mechanism is necessary. In
>>>>> this case it
>>>>>> is used a spinlock since f_midi_transmit is also called by
>>>>> usb_request->complete
>>>>>> callback in interrupt context.
>>>>>>
>>>>>> On benchmarks realized by me, spinlocks were more efficient then
>>>>> scheduling
>>>>>> the f_midi_transmit tasklet in process context and using a mutex
>>>>>> to synchronize. Also it performs better then previous
>>>>>> implementation
>>>>> that
>>>>>> allocated a usb_request for every new transmit made.
>>>>>
>>>>> behaves better in what way ? Also, previous implementation would not
>>>>> suffer from this concurrency problem, right ?
>>>>
>>>> The spin lock is faster than allocating usb requests all the time,
>>>> even if the udc uses da for it.
>>>
>>> did you measure ? Is the extra speed really necessary ? How did you
>>> benchmark this ?
>>
>> Yes I did measure and it was not that significant. This is not about
>> speed. There was a bug in that approach that I already explained on
>
> you have very confusing statements. When I mentioned that previous code
> wouldn't have the need for the spinlock you replied that spinlock was
> faster.
>
> When I asked you about benchmarks you reply saying it's not about the
> speed.
>
> Make up your mind dude. What are you trying to achieve ?
>
>> that patch, which was approved and applied BTW.
>
> patches can be reverted if we realise we're better off without
> them. Don't get cocky, please.

Yes am I aware of that, but I honestly think that is the wrong way of
dealing with this.

?? I don't get why am I giving this impression.

>
>> Any way, this spinlock should've been there since that patch but I
>> couldn't really trigger this problem without a stress test.
>
> which tells me you sent me patches without properly testing. How much
> time did it take to trigger this ? How did you trigger this situation ?

No, that is no true. The implementation I sent is working properly for
any real world usage.

The stress test I made to break the current implementation is *not* a
real use-case. I made it in order to push as far as possible how fast
the driver can *reliably* handle while sending and reading data. Then I
noticed the bug.

So, to answer your question. To trigger this bug is not a matter of
time. The following needs to happen:
1. Device send MIDI message that is *bigger* than the usb request
length. (just this by itself is really unlikely to happen in real world
usage)
2. Host send a MIDI message back *exactly* at the same time as the
device is processing the second part of the usb request from the same
message.

I couldn't trigger this in all the tests we've made. I just triggered
when I was sending huge messages back and forth (device <-> host) as
mentioned.

In fact, we have thousands of devices out there without this patch (but
with my previous patch that introduced this bug).

I am not trying to say it wasn't a mistake. That patch unfortunately
introduces this bug, but it has real improvements over the previous
implementation. AFAIR the improvements are:
* Fixes a bug that was causing the DMA buffer to fill it up causing a
kernel panic.
* Pre allocate IN usb requests so there is no allocation overhead while
sending data (same behavior already existed for the OUT endpoint). This
ensure that the DMA memory is not misused affecting the rest of the system.
* It doesn't crash if the host doesn't send an ACK after IN data
packets and we have reached the limit of available memory. Also, this is
useful because it causes the ALSA layer to timeout, which is the correct
userspace behavior.
* Continuous to send data to the correct Jack (associated to each ALSA
substream) if that was interrupted somehow, for instance by the size
limit of a usb request.


>
>> So, this patch fixes a bug in the current implementation.
>
> fixes a regression introduced by you, true. I'm trying to figure out if
> we're better off without the original patch; to make a good decision I
> need to know if the extra "speed" we get from not allocating requests on
> demand are really that important.
>
> So, how much faster did you get and is that extra "speed" really
> important ?

The speed is not relevant at all in this case. It was not the goal of
the patch, but I mentioned because it is obvious that with no memory
allocation there will be an increase of speed that the code is executed.

I did measure the speed improvements at that time, it was real but not
relevant. I don't think we should be discussing this anyway.

--
Felipe

Attachment: 0x92698E6A.asc
Description: application/pgp-keys