Re: [PATCH] usb: gadget: storage_common: make FSG_NUM_BUFFERSvariable size

From: Per Forlin
Date: Mon Aug 08 2011 - 16:45:31 EST


On 8 August 2011 22:23, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
> On Mon, 8 Aug 2011, Per Forlin wrote:
>
>> On 8 August 2011 20:45, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
>> > On Mon, 8 Aug 2011, Per Forlin wrote:
>> >
>> >> > Okay, 6% is a worthwhile improvement, though not huge.  Did you try 6
>> >> > or 8 buffers?  I bet going beyond 4 makes very little difference.
>> >> >
>> >> On my board 4 buffers are enough. More buffers will make no difference.
>> >>
>> >> Background study
>> >> I started by running dd to measure performance on my target side.
>> >> Simply to measure what would be the maximum bandwidth, 20MiB/s on my
>> >> board. Then I started the gadget mass storage on the device and run
>> >> the sae test from the PC host side, 18.7 MiB/s. I guessed this might
>> >> be due to serialized cache handling. I tested to remove the dma_map
>> >> call (replaced it with virt_to_dma). This is just a dummy test to see
>> >> if this is causing the performance drop. Without dma_map I get
>> >> 20MiB/s. It appears that the loss is due to dma_map call. The dma_map
>> >> only adds latency for the first request.
>> >
>> > What exactly do you mean by "first request"?
>> When both buffers are empty. The first request are filled up by vfs
>> and then prepared. This first time there are no ongoing transfer over
>> USB therefore this cost more, since it can't run in parallel with
>> ongoing transfer. Every time the to buffers run empty there is an
>> extra cost for the first request in the next series of requests. The
>> reason for not getting data from vfs in time I don't know.
>
> Okay, that makes sense.  But what connection does this have with
> dma_map?  Don't you also have situations where both buffers are empty
> when you replace dma_map with virt_to_dma?
>
All I do here is to remove the cost of dma_map in order to verify if
dma_map is the one responsible for the delay. If the performance is
good without dma_map I know the extra cost is due to dma_map running
in serial instead of parallel. If the performance would be bad even
without dma_map something else is causing it. It doesn't affect how
the VFS feeds data to the buffers.

>> Instead of getting refill from vfs smoothly the refills comes in
>> burst. VFS fills up the two buffers, then USB manage to transmit all
>> two buffers before VFS refills new data. Roughly 20% of the times VFS
>> fills up data in time before USB has consumed it all. 80% of the times
>> USB manage to consume all data before VFS refills data in the buffers.
>> The reason for the burst like affects are probably due to power save,
>> which add latency in the system.
>
>> This is true. In my system the buffer doesn't get filled up before the
>> previous has finished. With 4 buffers it works fine.
>
> Adding buffers to smooth out the bursty VFS behavior is a reasonable
> thing to do.  But you should improve the patch description and the
> comments; the real reason for reduced throughput is VFS's behavior.
> Something like what you just wrote here would be fine.
I agree. The main reason is to compensate for bursty VFS behavior.

>
> And reduce the maximum number of buffers to 4.  When you've made those
> changes, I'll Ack the patch.
I'll do.

Thanks for your time and patience,
Per
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/