Re: [PATCH 2/5] usb: gadget: f_midi: added spinlock on transmit function
From: Felipe Balbi
Date: Tue Mar 08 2016 - 09:02:42 EST
Hi,
Felipe Ferreri Tonello <eu@xxxxxxxxxxxxxxxxx> writes:
>>>>>>> Since f_midi_transmit is called by both ALSA and USB frameworks, it
>>>>>> can
>>>>>>> potentially cause a race condition between both calls. This is bad
>>>>>> because the
>>>>>>> way f_midi_transmit is implemented can't handle concurrent calls.
>>>>>> This is due
>>>>>>> to the fact that the usb request fifo looks for the next element and
>>>>>> only if
>>>>>>> it has data to process it enqueues the request, otherwise re-uses it.
>>>>>> If both
>>>>>>> (ALSA and USB) frameworks calls this function at the same time, the
>>>>>>> kfifo_seek() will return the same usb_request, which will cause a
>>>>>> race
>>>>>>> condition.
>>>>>>>
>>>>>>> To solve this problem a syncronization mechanism is necessary. In
>>>>>> this case it
>>>>>>> is used a spinlock since f_midi_transmit is also called by
>>>>>> usb_request->complete
>>>>>>> callback in interrupt context.
>>>>>>>
>>>>>>> On benchmarks realized by me, spinlocks were more efficient then
>>>>>> scheduling
>>>>>>> the f_midi_transmit tasklet in process context and using a mutex
>>>>>>> to synchronize. Also it performs better then previous
>>>>>>> implementation
>>>>>> that
>>>>>>> allocated a usb_request for every new transmit made.
>>>>>>
>>>>>> behaves better in what way ? Also, previous implementation would not
>>>>>> suffer from this concurrency problem, right ?
>>>>>
>>>>> The spin lock is faster than allocating usb requests all the time,
>>>>> even if the udc uses da for it.
>>>>
>>>> did you measure ? Is the extra speed really necessary ? How did you
>>>> benchmark this ?
>>>
>>> Yes I did measure and it was not that significant. This is not about
>>> speed. There was a bug in that approach that I already explained on
>>
>> you have very confusing statements. When I mentioned that previous code
>> wouldn't have the need for the spinlock you replied that spinlock was
>> faster.
>>
>> When I asked you about benchmarks you reply saying it's not about the
>> speed.
>>
>> Make up your mind dude. What are you trying to achieve ?
>>
>>> that patch, which was approved and applied BTW.
>>
>> patches can be reverted if we realise we're better off without
>> them. Don't get cocky, please.
>
> Yes am I aware of that, but I honestly think that is the wrong way of
> dealing with this.
>
> ?? I don't get why am I giving this impression.
re-read your emails. The gist goes like this:
. Send patch
. Got comments
. Well, whatever, you can just ignore if you don't agree
>>> Any way, this spinlock should've been there since that patch but I
>>> couldn't really trigger this problem without a stress test.
>>
>> which tells me you sent me patches without properly testing. How much
>> time did it take to trigger this ? How did you trigger this situation ?
>
> No, that is no true. The implementation I sent is working properly for
> any real world usage.
>
> The stress test I made to break the current implementation is *not* a
> real use-case. I made it in order to push as far as possible how fast
> the driver can *reliably* handle while sending and reading data. Then I
> noticed the bug.
>
> So, to answer your question. To trigger this bug is not a matter of
> time. The following needs to happen:
> 1. Device send MIDI message that is *bigger* than the usb request
> length. (just this by itself is really unlikely to happen in real world
> usage)
I wouldn't say it's unlikely. You just cannot trust the other side of
the wire. We've seen e.g. Xbox 360's SCSI layer sending messages of the
wrong size and we worked around them in g_mass_storage.
Broken implementations are a real thing ;-)
> 2. Host send a MIDI message back *exactly* at the same time as the
> device is processing the second part of the usb request from the same
> message.
also not that unlikely to happen ;-) You can't assume the host will only
shift tokens on the wire at the time you're expecting it to.
> I couldn't trigger this in all the tests we've made. I just triggered
> when I was sending huge messages back and forth (device <-> host) as
> mentioned.
fair enough.
> In fact, we have thousands of devices out there without this patch (but
> with my previous patch that introduced this bug).
that's thousands of devices waiting to have a problem, right ? :-)
> I am not trying to say it wasn't a mistake. That patch unfortunately
> introduces this bug, but it has real improvements over the previous
> implementation. AFAIR the improvements are:
> * Fixes a bug that was causing the DMA buffer to fill it up causing a
> kernel panic.
this is a good point. Had forgotten about that detail. Thanks
> * Pre allocate IN usb requests so there is no allocation overhead while
> sending data (same behavior already existed for the OUT endpoint). This
> ensure that the DMA memory is not misused affecting the rest of the
> system.
also, arguably, a good idea. Recycling requests is a lot nicer and it's
what most gadget drivers do.
> * It doesn't crash if the host doesn't send an ACK after IN data
> packets and we have reached the limit of available memory. Also, this is
> useful because it causes the ALSA layer to timeout, which is the correct
> userspace behavior.
right
> * Continuous to send data to the correct Jack (associated to each ALSA
> substream) if that was interrupted somehow, for instance by the size
> limit of a usb request.
ok.
>>> So, this patch fixes a bug in the current implementation.
>>
>> fixes a regression introduced by you, true. I'm trying to figure out if
>> we're better off without the original patch; to make a good decision I
>> need to know if the extra "speed" we get from not allocating requests on
>> demand are really that important.
>>
>> So, how much faster did you get and is that extra "speed" really
>> important ?
>
> The speed is not relevant at all in this case. It was not the goal of
> the patch, but I mentioned because it is obvious that with no memory
> allocation there will be an increase of speed that the code is executed.
>
> I did measure the speed improvements at that time, it was real but not
> relevant. I don't think we should be discussing this anyway.
fair enough. This was probably the first email from you which gave me
some peace of mind that you know what you're doing with this fix. Keep
in mind that we all receive hundreds of emails a day and it's difficult
to track things over time.
It's also a big PITA when someone sends fixes and cleanups on the same
series and/or with dependencies between them. The correct way is to send
*only* fixes first. They should be minimal patches that *only* fix the
problem. If the code looks messy or doesn't follow the coding style,
that's something you do on a completely separate fix and, usually, from
a clean topic branch starting at a tag from Linus (exceptions may arise,
of course).
So anyway, to finally finish this up. Can you send JUST the bare minimum
fix necessary to avoid the regression ? Also, add a proper Fixes: foobar
line on commit log (see commit e18b7975c885bc3a938b9a76daf32957ea0235fa
for an example).
Then we can get that merged. Keep in mind that you might have to Cc
stable (see same commit listed above).
After this is sorted out, then let's see how we can help you move your
product to libusbgx and check if there's anything missing in configfs
to cope with your use-case.
ps: can you point me to your devices shipping with f_midi ? Which
architecture are they using ? Which USB Peripheral Controller ? This
might be a good addition to my test farm depending on your answers above
:-p
cheers
--
balbi
Attachment:
signature.asc
Description: PGP signature