Re: [PATCH] usb: gadget: storage_common: make FSG_NUM_BUFFERSvariable size

From: Per Forlin
Date: Mon Aug 08 2011 - 14:11:43 EST


On 8 August 2011 16:34, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
> On Mon, 8 Aug 2011, Per Forlin wrote:
>
>> FSG_NUM_BUFFERS is set to 2 as default.
>> Usually 2 buffers are enough to establish a good
>> double buffering pipeline. But when dealing with expensive
>> request preparation (i.e. dma_map) there may be benefits of
>> increasing the number of buffers. There is an extra cost for
>> every first request, the others are prepared in parallell with
>> an ongoing transfer. Every time all buffers are consumed there is
>> an additional cost for the next first request. Increasing the number
>> of buffers decreases the risk of running out of buffers.
>
> The reasoning here is not clear.  Throughput is limited on the device
> side mostly by the time required to prepare the buffer, transfer the
> data, and copy the data to/from the backing storage.  Using more
> buffers won't affect this directly.  Putting this another way, if the
> time required to prepare a buffer and copy the data is longer than the
> time needed to transfer the data over the USB link, the throughput
> won't be optimal no matter how many buffers you use.
>
> On the other hand, adding buffers will smooth out irregularities and
> latencies caused by scheduling delays when other tasks are running at
> the same time.  Still, I would expect the benefit to be fairly small
> and to diminish rapidly as more buffers are added.
>
>> Test set up
>>  * Running dd on the host reading from the mass storage device
>>  * cmdline: dd if=/dev/sdb of=/dev/null bs=4k count=$((256*100))
>>  * Caches are dropped on the host and on the device before each run
>>
>> Measurements on a Snowball board with ondemand_govenor active.
>>
>> FSG_NUM_BUFFERS 2
>> 104857600 bytes (105 MB) copied, 5.62173 s, 18.7 MB/s
>> 104857600 bytes (105 MB) copied, 5.61811 s, 18.7 MB/s
>> 104857600 bytes (105 MB) copied, 5.57817 s, 18.8 MB/s
>> 104857600 bytes (105 MB) copied, 5.57769 s, 18.8 MB/s
>> 104857600 bytes (105 MB) copied, 5.59654 s, 18.7 MB/s
>> 104857600 bytes (105 MB) copied, 5.58948 s, 18.8 MB/s
>>
>> FSG_NUM_BUFFERS 4
>> 104857600 bytes (105 MB) copied, 5.26839 s, 19.9 MB/s
>> 104857600 bytes (105 MB) copied, 5.2691 s, 19.9 MB/s
>> 104857600 bytes (105 MB) copied, 5.2711 s, 19.9 MB/s
>> 104857600 bytes (105 MB) copied, 5.27174 s, 19.9 MB/s
>> 104857600 bytes (105 MB) copied, 5.27261 s, 19.9 MB/s
>> 104857600 bytes (105 MB) copied, 5.27135 s, 19.9 MB/s
>> 104857600 bytes (105 MB) copied, 5.27249 s, 19.9 MB/s
>
> Okay, 6% is a worthwhile improvement, though not huge.  Did you try 6
> or 8 buffers?  I bet going beyond 4 makes very little difference.
>
On my board 4 buffers are enough. More buffers will make no difference.

Background study
I started by running dd to measure performance on my target side.
Simply to measure what would be the maximum bandwidth, 20MiB/s on my
board. Then I started the gadget mass storage on the device and run
the sae test from the PC host side, 18.7 MiB/s. I guessed this might
be due to serialized cache handling. I tested to remove the dma_map
call (replaced it with virt_to_dma). This is just a dummy test to see
if this is causing the performance drop. Without dma_map I get
20MiB/s. It appears that the loss is due to dma_map call. The dma_map
only adds latency for the first request.

Studying the flow of buffers I can see, both buffers are filled with
data from vfs, then both are transmitted over USB, burst like
behaviour. More buffers will add "smoothness" and prevent the bursts
to affect performance negatively. With 4 buffers, there were very few
cases when all 4 buffers were empty during a transfer

>> There may not be one optimal number for all boards. That is the
>> reason for adding the number to Kconfig,
>>
>> Signed-off-by: Per Forlin <per.forlin@xxxxxxxxxx>
>> ---
>>  drivers/usb/gadget/Kconfig          |   15 +++++++++++++++
>>  drivers/usb/gadget/storage_common.c |    4 ++--
>>  2 files changed, 17 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/usb/gadget/Kconfig b/drivers/usb/gadget/Kconfig
>> index 029e288..bbd17f3 100644
>> --- a/drivers/usb/gadget/Kconfig
>> +++ b/drivers/usb/gadget/Kconfig
>> @@ -96,6 +96,21 @@ config USB_GADGET_VBUS_DRAW
>>          This value will be used except for system-specific gadget
>>          drivers that have more specific information.
>>
>> +config USB_GADGET_STORAGE_NUM_BUFFERS
>> +     int "Number of storage pipline buffers"
>> +     range 2 64
>
> 64 is definitely way into the overkill range.  I'd be very tempted to
> leave the maximum set at 4.  You might even consider setting it to 4
> always -- although each buffer requires 16 KB of contiguous kernel
> memory so minimizing the number of buffers is always desirable.
I agree, 64 is not a realistic number. Max 4 is fine with me.

>
> Alan Stern
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/