Re: [PATCH v10 0/6] Re-introduce TX FIFO resize for larger EP bursting

From: Wesley Cheng
Date: Thu Jun 17 2021 - 18:25:40 EST


Hi,

On 6/17/2021 2:55 PM, Ferry Toth wrote:
> Hi
>
> Op 17-06-2021 om 23:48 schreef Wesley Cheng:
>> Hi,
>>
>> On 6/17/2021 2:01 PM, Ferry Toth wrote:
>>> Hi
>>>
>>> Op 17-06-2021 om 11:58 schreef Wesley Cheng:
>>>> Changes in V10:
>>>>   - Fixed compilation errors in config where OF is not used (error due to
>>>>     unknown symbol for of_add_property()).  Add of_add_property() stub.
>>>>   - Fixed compilation warning for incorrect argument being passed to
>>>> dwc3_mdwidth
>>> This fixes the OOPS I had in V9. I do not see any change in performance
>>> on Merrifield though.
>> I see...thanks Ferry! With your testing, are you writing to the device's
>> internal storage (ie UFS, eMMC, etc...), or did you use a ramdisk as well?
>
> In this case I just tested the EEM path using iperf3.
>

Got it. I don't believe f_eem will use a high enough (if at all)
bMaxBurst value to change the TXFIFO size.

>> If not with a ramdisk, we might want to give that a try to avoid the
>> storage path being the bottleneck. You can use "dd" to create an empty
>> file, and then just use that as the LUN's backing file.
>>
>> echo ramdisk.img >
>> /sys/kernel/config/usb_gadget/g1/functions/mass_storage.0/lun.0/file
>
> Ah, why didn't I think of that. I have currently mass storage setup with
> eMMC but it seems that is indeed the bottleneck.
>

:), not a problem...I've been working on getting the ideal set up for
the performance profiling for awhile, so anything I can do to make sure
we get some good results.

> I'll try with a ramdisk and let you know.
>

Thanks again for the testing, Ferry.

Thanks
Wesley Cheng

>> Thanks
>> Wesley Cheng
>>
>>>> Changes in V9:
>>>>   - Fixed incorrect patch in series.  Removed changes in DTSI, as
>>>> dwc3-qcom will
>>>>     add the property by default from the kernel.
>>>>
>>>> Changes in V8:
>>>>   - Rebased to usb-testing
>>>>   - Using devm_kzalloc for adding txfifo property in dwc3-qcom
>>>>   - Removed DWC3 QCOM ACPI property for enabling the txfifo resize
>>>>
>>>> Changes in V7:
>>>>   - Added a new property tx-fifo-max-num for limiting how much fifo
>>>> space the
>>>>     resizing logic can allocate for endpoints with large burst
>>>> values.  This
>>>>     can differ across platforms, and tie in closely with overall
>>>> system latency.
>>>>   - Added recommended checks for DWC32.
>>>>   - Added changes to set the tx-fifo-resize property from dwc3-qcom by
>>>> default
>>>>     instead of modifying the current DTSI files.
>>>>   - Added comments on all APIs/variables introduced.
>>>>   - Updated the DWC3 YAML to include a better description of the
>>>> tx-fifo-resize
>>>>     property and added an entry for tx-fifo-max-num.
>>>>
>>>> Changes in V6:
>>>>   - Rebased patches to usb-testing.
>>>>   - Renamed to PATCH series instead of RFC.
>>>>   - Checking for fs_descriptors instead of ss_descriptors for
>>>> determining the
>>>>     endpoint count for a particular configuration.
>>>>   - Re-ordered patch series to fix patch dependencies.
>>>>
>>>> Changes in V5:
>>>>   - Added check_config() logic, which is used to communicate the
>>>> number of EPs
>>>>     used in a particular configuration.  Based on this, the DWC3
>>>> gadget driver
>>>>     has the ability to know the maximum number of eps utilized in all
>>>> configs.
>>>>     This helps reduce unnecessary allocation to unused eps, and will
>>>> catch fifo
>>>>     allocation issues at bind() time.
>>>>   - Fixed variable declaration to single line per variable, and
>>>> reverse xmas.
>>>>   - Created a helper for fifo clearing, which is used by ep0.c
>>>>
>>>> Changes in V4:
>>>>   - Removed struct dwc3* as an argument for dwc3_gadget_resize_tx_fifos()
>>>>   - Removed WARN_ON(1) in case we run out of fifo space
>>>>   Changes in V3:
>>>>   - Removed "Reviewed-by" tags
>>>>   - Renamed series back to RFC
>>>>   - Modified logic to ensure that fifo_size is reset if we pass the
>>>> minimum
>>>>     threshold.  Tested with binding multiple FDs requesting 6 FIFOs.
>>>>
>>>> Changes in V2:
>>>>   - Modified TXFIFO resizing logic to ensure that each EP is reserved a
>>>>     FIFO.
>>>>   - Removed dev_dbg() prints and fixed typos from patches
>>>>   - Added some more description on the dt-bindings commit message
>>>>
>>>> Currently, there is no functionality to allow for resizing the
>>>> TXFIFOs, and
>>>> relying on the HW default setting for the TXFIFO depth.  In most
>>>> cases, the
>>>> HW default is probably sufficient, but for USB compositions that contain
>>>> multiple functions that require EP bursting, the default settings
>>>> might not be enough.  Also to note, the current SW will assign an EP to a
>>>> function driver w/o checking to see if the TXFIFO size for that
>>>> particular
>>>> EP is large enough. (this is a problem if there are multiple HW defined
>>>> values for the TXFIFO size)
>>>>
>>>> It is mentioned in the SNPS databook that a minimum of TX FIFO depth = 3
>>>> is required for an EP that supports bursting.  Otherwise, there may be
>>>> frequent occurences of bursts ending.  For high bandwidth functions,
>>>> such as data tethering (protocols that support data aggregation), mass
>>>> storage, and media transfer protocol (over FFS), the bMaxBurst value
>>>> can be
>>>> large, and a bigger TXFIFO depth may prove to be beneficial in terms
>>>> of USB
>>>> throughput. (which can be associated to system access latency,
>>>> etc...)  It
>>>> allows for a more consistent burst of traffic, w/o any interruptions, as
>>>> data is readily available in the FIFO.
>>>>
>>>> With testing done using the mass storage function driver, the results
>>>> show
>>>> that with a larger TXFIFO depth, the bandwidth increased significantly.
>>>>
>>>> Test Parameters:
>>>>   - Platform: Qualcomm SM8150
>>>>   - bMaxBurst = 6
>>>>   - USB req size = 256kB
>>>>   - Num of USB reqs = 16
>>>>   - USB Speed = Super-Speed
>>>>   - Function Driver: Mass Storage (w/ ramdisk)
>>>>   - Test Application: CrystalDiskMark
>>>>
>>>> Results:
>>>>
>>>> TXFIFO Depth = 3 max packets
>>>>
>>>> Test Case | Data Size | AVG tput (in MB/s)
>>>> -------------------------------------------
>>>> Sequential|1 GB x     |
>>>> Read      |9 loops    | 193.60
>>>>       |           | 195.86
>>>>            |           | 184.77
>>>>            |           | 193.60
>>>> -------------------------------------------
>>>>
>>>> TXFIFO Depth = 6 max packets
>>>>
>>>> Test Case | Data Size | AVG tput (in MB/s)
>>>> -------------------------------------------
>>>> Sequential|1 GB x     |
>>>> Read      |9 loops    | 287.35
>>>>       |           | 304.94
>>>>            |           | 289.64
>>>>            |           | 293.61
>>>> -------------------------------------------
>>>>
>>>> Wesley Cheng (6):
>>>>    usb: gadget: udc: core: Introduce check_config to verify USB
>>>>      configuration
>>>>    usb: gadget: configfs: Check USB configuration before adding
>>>>    usb: dwc3: Resize TX FIFOs to meet EP bursting requirements
>>>>    of: Add stub for of_add_property()
>>>>    usb: dwc3: dwc3-qcom: Enable tx-fifo-resize property by default
>>>>    dt-bindings: usb: dwc3: Update dwc3 TX fifo properties
>>>>
>>>>   .../devicetree/bindings/usb/snps,dwc3.yaml         |  15 +-
>>>>   drivers/usb/dwc3/core.c                            |   9 +
>>>>   drivers/usb/dwc3/core.h                            |  15 ++
>>>>   drivers/usb/dwc3/dwc3-qcom.c                       |   9 +
>>>>   drivers/usb/dwc3/ep0.c                             |   2 +
>>>>   drivers/usb/dwc3/gadget.c                          | 212
>>>> +++++++++++++++++++++
>>>>   drivers/usb/gadget/configfs.c                      |  22 +++
>>>>   drivers/usb/gadget/udc/core.c                      |  25 +++
>>>>   include/linux/of.h                                 |   5 +
>>>>   include/linux/usb/gadget.h                         |   5 +
>>>>   10 files changed, 317 insertions(+), 2 deletions(-)
>>>>

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project