Re: [PATCH v9 0/5] Re-introduce TX FIFO resize for larger EP bursting

From: Wesley Cheng
Date: Tue Jun 08 2021 - 01:45:35 EST


Hi Greg/Felipe,

On 6/4/2021 7:36 AM, Greg KH wrote:
> On Fri, Jun 04, 2021 at 05:18:16PM +0300, Felipe Balbi wrote:
>>
>> Hi,
>>
>> Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> writes:
>>> On Wed, May 19, 2021 at 12:49:16AM -0700, Wesley Cheng wrote:
>>>> Changes in V9:
>>>> - Fixed incorrect patch in series. Removed changes in DTSI, as dwc3-qcom will
>>>> add the property by default from the kernel.
>>>
>>> This patch series has one build failure and one warning added:
>>>
>>> drivers/usb/dwc3/gadget.c: In function ‘dwc3_gadget_calc_tx_fifo_size’:
>>> drivers/usb/dwc3/gadget.c:653:45: warning: passing argument 1 of ‘dwc3_mdwidth’ makes pointer from integer without a cast [-Wint-conversion]
>>> 653 | mdwidth = dwc3_mdwidth(dwc->hwparams.hwparams0);
>>> | ~~~~~~~~~~~~~^~~~~~~~~~
>>> | |
>>> | u32 {aka unsigned int}
>>> In file included from drivers/usb/dwc3/debug.h:14,
>>> from drivers/usb/dwc3/gadget.c:25:
>>> drivers/usb/dwc3/core.h:1493:45: note: expected ‘struct dwc3 *’ but argument is of type ‘u32’ {aka ‘unsigned int’}
>>> 1493 | static inline u32 dwc3_mdwidth(struct dwc3 *dwc)
>>> | ~~~~~~~~~~~~~^~~
>>>
>>>
>>> drivers/usb/dwc3/dwc3-qcom.c: In function ‘dwc3_qcom_of_register_core’:
>>> drivers/usb/dwc3/dwc3-qcom.c:660:23: error: implicit declaration of function ‘of_add_property’; did you mean ‘of_get_property’? [-Werror=implicit-function-declaration]
>>> 660 | ret = of_add_property(dwc3_np, prop);
>>> | ^~~~~~~~~~~~~~~
>>> | of_get_property
>>>
>>>
>>> How did you test these?

I ran these changes on our internal branches, which were probably
missing some of the recent changes done to the DWC3 drivers. Will fix
the above compile errors and re-submit.

In regards to how much these changes have been tested, we've been
maintaining the TX FIFO resize logic downstream for a few years already,
so its being used in end products. We also verify this with our
internal testing, which has certain benchmarks we need to meet.

>>
>> to be honest, I don't think these should go in (apart from the build
>> failure) because it's likely to break instantiations of the core with
>> differing FIFO sizes. Some instantiations even have some endpoints with
>> dedicated functionality that requires the default FIFO size configured
>> during coreConsultant instantiation. I know of at OMAP5 and some Intel
>> implementations which have dedicated endpoints for processor tracing.
>>
>> With OMAP5, these endpoints are configured at the top of the available
>> endpoints, which means that if a gadget driver gets loaded and takes
>> over most of the FIFO space because of this resizing, processor tracing
>> will have a hard time running. That being said, processor tracing isn't
>> supported in upstream at this moment.
>>

I agree that the application of this logic may differ between vendors,
hence why I wanted to keep this controllable by the DT property, so that
for those which do not support this use case can leave it disabled. The
logic is there to ensure that for a given USB configuration, for each EP
it would have at least 1 TX FIFO. For USB configurations which don't
utilize all available IN EPs, it would allow re-allocation of internal
memory to EPs which will actually be in use.

>> I still think this may cause other places to break down. The promise the
>> databook makes is that increasing the FIFO size over 2x wMaxPacketSize
>> should bring little to no benefit, if we're not maintaining that, I
>> wonder if the problem is with some of the BUSCFG registers instead,
>> where we configure interconnect bursting and the like.
>

I've been referring mainly to the DWC3 programming guide for
recommendations on how to improve USB performance in:
Section 3.3.5 System Bus Features to Improve USB Performance

At least when I ran the initial profiling, adjusting the RX/TX
thresholds brought little to no benefits. Even in some of the examples,
they have diagrams showing a TXFIFO size of 6 max packets (Figure 3-5).
I think its difficult to say that the TX fifo resizing won't help in
systems with limited, or shared resources where the bus latencies would
be somewhat larger. By adjusting the TX FIFO size, the controller would
be able to fetch more data from system memory into the memory within the
controller, leading to less frequent end of bursts, etc... as data is
readily available.

In terms of adjusting the AXI/AHB bursting, I would think the bandwidth
increase would eventually be constrained based on your system's design.
We don't touch the GSBUSCFG registers, and leave them as is based off
the recommendations from the HW designers.

> Good points.
>
> Wesley, what kind of testing have you done on this on different devices?
>

As mentioned above, these changes are currently present on end user
devices for the past few years, so its been through a lot of testing :).

Thanks
Wesley Cheng


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project