Re: [PATCH 1/2] usb: dwc3: core: Introduce dwc3_device_reinit()

From: Roger Quadros
Date: Mon Apr 11 2016 - 08:37:38 EST


On 04/04/16 11:10, Felipe Balbi wrote:
>
> Hi,
>
> Roger Quadros <rogerq@xxxxxx> writes:
>>>>>>>>>>>>> We will need this function for a workaround.
>>>>>>>>>>>>> The function issues a softreset only to the device
>>>>>>>>>>>>> controller and performs minimal re-initialization
>>>>>>>>>>>>> so that the device controller can be usable.
>>>>>>>>>>>>>
>>>>>>>>>>>>> As some code is similar to dwc3_core_init() take out
>>>>>>>>>>>>> common code into dwc3_get_gctl_quirks().
>>>>>>>>>>>>>
>>>>>>>>>>>>> We add a new member (prtcap_mode) to struct dwc3 to
>>>>>>>>>>>>> keep track of the current mode in the PRTCAPDIR register.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Roger Quadros <rogerq@xxxxxx>
>>>>>>>>>>>>
>>>>>>>>>>>> I must say, I don't like this at all :-p There's ONE known silicon which
>>>>>>>>>>>> needs this because of a poor silicon integration which took an IP with a
>>>>>>>>>>>> known erratum where it can't be made to work on lower speeds and STILL
>>>>>>>>>>>> was integrated without a superspeed PHY.
>>>>>>>>>>>>
>>>>>>>>>>>> There's a reason why I never tried to push this upstream myself ;-)
>>>>>>>>>>>>
>>>>>>>>>>>> I'm really thinking we might be better off adding a quirk flag to skip
>>>>>>>>>>>> the metastability workaround and allow this ONE silicon to set the
>>>>>>>>>>>> controller to lower speed.
>>>>>>>>>>>>
>>>>>>>>>>>> John, can you check with your colleagues if we would ever fall into
>>>>>>>>>>>> STAR#9000525659 if we set maximum speed to high speed during driver
>>>>>>>>>>>> probe and never touch it again ? I would assume we don't really fall
>>>>>>>>>>>> into the metastability workaround, right ? We're not doing any sort of
>>>>>>>>>>>> PM for dwc3...
>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Felipe,
>>>>>>>>>
>>>>>>>>> Do you mean to keep DCFG.speed to SS and set dwc->maximum_speed to HS?
>>>>>>>>> I don't see an issue with that as long as we always ignore
>>>>>>>>> dwc->maximum_speed when programming DCFG.speed for all affected
>>>>>>>>> versions of the core. As long as the DCFG.speed = SS, you should not
>>>>>>>>> hit the STAR.
>>>>>>>>
>>>>>>>> I actually mean changing DCFG.speed during driver probe and never
>>>>>>>> touching it again. Would that still cause problems ?
>>>>>>>>
>>>>>>>
>>>>>>> In that case I'm not sure. The engineer who would know is off until
>>>>>>> next week so I'll get back to you as soon as I can talk to him about
>>>>>>> it.
>>>>>>
>>>>>
>>>>> So the engineers said that this issue can occur while set to HS and
>>>>> the run/stop bit is changed so it seems that won't work.
>>>>
>>>> Thanks John.
>>>>
>>>> Felipe,
>>>>
>>>> any suggestion how we can fix this upstream?
>>>
>>> no idea, I don't have a lot of memory about this problem. I really don't
>>> remember the details about this, is there an openly available errata
>>> document which I could read ? /me goes search for it.
>>>
>>> I found [1] which tells me, the following:
>>>
>>>
>>> | i819 | A Device Control Bit Meta-Stability for USB3.0 Controller in USB2.0 Mode |
>>> |-------------+----------------------------------------------------------------------------|
>>> | Criticality | Medium |
>>> | | |
>>> | Descritiion | When USB3.0 controller core is programmed to be a USB 2.0-only device |
>>> | | hardware meta-stability on USB_DCTL[31]RUNSTOP bit causing the core to |
>>> | | attempt high speed as well as SuperSpeed connection or completely miss |
>>> | | the attach request. |
>>> | | |
>>> | Workaround | If the requirement is to always function in USB 2.0 mode, there is no |
>>> | | workaround. |
>>> | | Otherwise, you can always program the USB controller core to be SuperSpeed |
>>> | | 3.0 capable (USB_DCFG[2:0]DEVSPD = 0x4) |
>>> | | |
>>> | Revisions | SR 1.1, 2.0 |
>>> | Impacted | |
>>> |-------------+----------------------------------------------------------------------------|
>>>
>>> So, TI's own documentation says that there is _no_ workaround. My
>>
>> We are working on updating that document. Apparently Synopsis has suggested this workaround.
>> pasting verbatim
>>
>> "
>> - As last step of device configuration we set the RUNSTOP bit in DCTL.
>>
>> - Once we set the RUNSTOP bit, we need to monitor GDBGLTSSM for 100 ms until one of the two below happens:.
>>
>> o We see the GDBGLTSSM.LTDB_LINK_STATE changing from 4
>>
>> o We receive the USB 2.0 reset interrupt.
>>
>> If none of above happens then we can stop monitoring it.
>>
>> - If state change from 4 occurs issue a SoftReset thru DCTL.CSftRst and reconfigure Device. This time it is guaranteed that no metastability will occur so no need to do the 100ms timeout.
>> "
>
> I don't have this text anywhere so I don't know. Is this something TI
> came up with or Synopsys ? Unless I can see a document (preferrably from
> Synopsys) stating this, I can't really accept $subject.

OK. I'll try to find out if there is an official document about this.

>
> Another question is: if all it takes is an extra SoftReset, why don't we
> just reset it during probe() if max_speed < SUPER and we're running on
> rev < 2.20a ? BTW, which revision of the IP is on AM57x/DRA7x ?

The issue might happen on any Run/Stop transition so not sure if doing it
SoftReset just at probe fixes anything.

On DRA7x it is rev 2.02a.

>
>>> question is, then: How are you sure that resetting the device actually
>>> solves the issue ? Did you really hit the metastability problem and
>>> noted that it works after a soft-reset ? How did you verify that
>>
>> I don't know if it solves the issue or not. It was suggested by
>> Synopsis to TI's silicon team.
>
> now that's a bummer ;-)
>
>> I never hit the metastability problem detection condition in my
>> overnight tests (i.e. LTDB_LINK_STATE != 4).
>
> overnight is not enough. You need to keep this running for weeks.

how many weeks is acceptable for you? I can run for that long, no problem.
And what if the issue doesn't happen in that time frame, would you still
consider this case?

>
>> I have verified that things work after a soft-reset by faking that the
>> error happens.
>
> but if you never hit the actual failure, you have verified that it works
> _without_ the workaround. I mean, if you can't be sure RUN/STOP went
> metastable, you can't really be sure you're working around the Erratum.
>
>>> Run/Stop was in a metastable state, considering that Run/Stop signal is
>>> not visible outside the die ?
>>
>> LTDB_LINK_STATE != 4 within 100ms or RUNSTOP set is the condition to
>> detect that the issue occurred.
>
> this doesn't prove anything. This just means that your 100ms timer
> expired. Unless you can verify that Run/Stop is in metastability, you
> cannot be sure this workaround works.
>
> Did anybody run silicon simulation to verify this case ? It's really the
> only way to be sure.

AFAIK this wasn't reproducible during silicon simulation either.

>
>>> It seems to me that resetting the IP is just as "dangerous" as setting
>>> the IP to High-speed in the first place. No ?
>>
>> The soft-reset is just a recovery mechanism if that error is ever hit.
>
> but you don't know if that's a *proper* recovery mechanism because you
> never even *hit* the error.
>
>> Putting the controller in reset state means it is in a known
>> state. Why do you say it would be dangerous?
>
> Because you can't predict the systems' behavior. If the flip-flop didn't
> have time to settle into 0 or 1 state, you don't know what the
> combinatorial part of the IP will do with that bogus value. It's truly
> unpredictable. You also cannot know, for sure, that a SoftReset will be
> enough to bring that flip-flop out of metastability.

I'm not an expert in this area and can only follow the advice the Silicon team gives.

>
>> The original workaround i.e. setting the High-speed instance to
>> Super-speed to avoid this errata is causing another side
>> effect. i.e. erratic interrupts happen and more than 2 seconds delay
>
> this should have been an expected side-effect when you design a
> SuperSpeed controller without a SuperSpeed PHY and don't properly
> terminate inputs. What you have is a floating PIPE3 interface not
> properly terminated and capturing random noise (basically acting as a
> very poor antenna inside your die). Of course the IP will go bonkers and
> give you "erratic error" interrupts. It has no idea what the hell this
> "PHY" on the PIPE3 interface is doing.

We know that. The damage is already done. :)

>
>> to enumerations. This problem is more serious and so we have to keep
>> the controller in High-speed and tackle the meta-stability condition
>> if it happens.
>
> you have to tackle the meta-stability, sure, but we need guarantee that
> $subject IS indeed tackling that problem. Unless there's proof that this
> really solves the meta-stability issue, I won't take $subject. Sorry
> dude, but I don't want regressions elsewhere because of a badly
> validated patch.

I understand. I will see if someone from TI can provide me official documentation
about the workaround.

>
> Bottomline, it's not enough to _state_ that it solves the problem. You
> need to first *trigger* the issue without the workaround, then apply
> workaround and trigger it again. Then, and only then, you can be certain
> you're fixing the problem.
>
> After that, we will look into how to make sure this has no impact to
> other users.
>

OK, Thanks.

cheers,
-roger