Re: [PATCH 1/2] usb: dwc3: core: Introduce dwc3_device_reinit()
From: Felipe Balbi
Date: Mon Apr 04 2016 - 04:12:51 EST
Hi,
Roger Quadros <rogerq@xxxxxx> writes:
>>>>>>>>>>>> We will need this function for a workaround.
>>>>>>>>>>>> The function issues a softreset only to the device
>>>>>>>>>>>> controller and performs minimal re-initialization
>>>>>>>>>>>> so that the device controller can be usable.
>>>>>>>>>>>>
>>>>>>>>>>>> As some code is similar to dwc3_core_init() take out
>>>>>>>>>>>> common code into dwc3_get_gctl_quirks().
>>>>>>>>>>>>
>>>>>>>>>>>> We add a new member (prtcap_mode) to struct dwc3 to
>>>>>>>>>>>> keep track of the current mode in the PRTCAPDIR register.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Roger Quadros <rogerq@xxxxxx>
>>>>>>>>>>>
>>>>>>>>>>> I must say, I don't like this at all :-p There's ONE known silicon which
>>>>>>>>>>> needs this because of a poor silicon integration which took an IP with a
>>>>>>>>>>> known erratum where it can't be made to work on lower speeds and STILL
>>>>>>>>>>> was integrated without a superspeed PHY.
>>>>>>>>>>>
>>>>>>>>>>> There's a reason why I never tried to push this upstream myself ;-)
>>>>>>>>>>>
>>>>>>>>>>> I'm really thinking we might be better off adding a quirk flag to skip
>>>>>>>>>>> the metastability workaround and allow this ONE silicon to set the
>>>>>>>>>>> controller to lower speed.
>>>>>>>>>>>
>>>>>>>>>>> John, can you check with your colleagues if we would ever fall into
>>>>>>>>>>> STAR#9000525659 if we set maximum speed to high speed during driver
>>>>>>>>>>> probe and never touch it again ? I would assume we don't really fall
>>>>>>>>>>> into the metastability workaround, right ? We're not doing any sort of
>>>>>>>>>>> PM for dwc3...
>>>>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Felipe,
>>>>>>>>
>>>>>>>> Do you mean to keep DCFG.speed to SS and set dwc->maximum_speed to HS?
>>>>>>>> I don't see an issue with that as long as we always ignore
>>>>>>>> dwc->maximum_speed when programming DCFG.speed for all affected
>>>>>>>> versions of the core. As long as the DCFG.speed = SS, you should not
>>>>>>>> hit the STAR.
>>>>>>>
>>>>>>> I actually mean changing DCFG.speed during driver probe and never
>>>>>>> touching it again. Would that still cause problems ?
>>>>>>>
>>>>>>
>>>>>> In that case I'm not sure. The engineer who would know is off until
>>>>>> next week so I'll get back to you as soon as I can talk to him about
>>>>>> it.
>>>>>
>>>>
>>>> So the engineers said that this issue can occur while set to HS and
>>>> the run/stop bit is changed so it seems that won't work.
>>>
>>> Thanks John.
>>>
>>> Felipe,
>>>
>>> any suggestion how we can fix this upstream?
>>
>> no idea, I don't have a lot of memory about this problem. I really don't
>> remember the details about this, is there an openly available errata
>> document which I could read ? /me goes search for it.
>>
>> I found [1] which tells me, the following:
>>
>>
>> | i819 | A Device Control Bit Meta-Stability for USB3.0 Controller in USB2.0 Mode |
>> |-------------+----------------------------------------------------------------------------|
>> | Criticality | Medium |
>> | | |
>> | Descritiion | When USB3.0 controller core is programmed to be a USB 2.0-only device |
>> | | hardware meta-stability on USB_DCTL[31]RUNSTOP bit causing the core to |
>> | | attempt high speed as well as SuperSpeed connection or completely miss |
>> | | the attach request. |
>> | | |
>> | Workaround | If the requirement is to always function in USB 2.0 mode, there is no |
>> | | workaround. |
>> | | Otherwise, you can always program the USB controller core to be SuperSpeed |
>> | | 3.0 capable (USB_DCFG[2:0]DEVSPD = 0x4) |
>> | | |
>> | Revisions | SR 1.1, 2.0 |
>> | Impacted | |
>> |-------------+----------------------------------------------------------------------------|
>>
>> So, TI's own documentation says that there is _no_ workaround. My
>
> We are working on updating that document. Apparently Synopsis has suggested this workaround.
> pasting verbatim
>
> "
> - As last step of device configuration we set the RUNSTOP bit in DCTL.
>
> - Once we set the RUNSTOP bit, we need to monitor GDBGLTSSM for 100 ms until one of the two below happens:.
>
> o We see the GDBGLTSSM.LTDB_LINK_STATE changing from 4
>
> o We receive the USB 2.0 reset interrupt.
>
> If none of above happens then we can stop monitoring it.
>
> - If state change from 4 occurs issue a SoftReset thru DCTL.CSftRst and reconfigure Device. This time it is guaranteed that no metastability will occur so no need to do the 100ms timeout.
> "
I don't have this text anywhere so I don't know. Is this something TI
came up with or Synopsys ? Unless I can see a document (preferrably from
Synopsys) stating this, I can't really accept $subject.
Another question is: if all it takes is an extra SoftReset, why don't we
just reset it during probe() if max_speed < SUPER and we're running on
rev < 2.20a ? BTW, which revision of the IP is on AM57x/DRA7x ?
>> question is, then: How are you sure that resetting the device actually
>> solves the issue ? Did you really hit the metastability problem and
>> noted that it works after a soft-reset ? How did you verify that
>
> I don't know if it solves the issue or not. It was suggested by
> Synopsis to TI's silicon team.
now that's a bummer ;-)
> I never hit the metastability problem detection condition in my
> overnight tests (i.e. LTDB_LINK_STATE != 4).
overnight is not enough. You need to keep this running for weeks.
> I have verified that things work after a soft-reset by faking that the
> error happens.
but if you never hit the actual failure, you have verified that it works
_without_ the workaround. I mean, if you can't be sure RUN/STOP went
metastable, you can't really be sure you're working around the Erratum.
>> Run/Stop was in a metastable state, considering that Run/Stop signal is
>> not visible outside the die ?
>
> LTDB_LINK_STATE != 4 within 100ms or RUNSTOP set is the condition to
> detect that the issue occurred.
this doesn't prove anything. This just means that your 100ms timer
expired. Unless you can verify that Run/Stop is in metastability, you
cannot be sure this workaround works.
Did anybody run silicon simulation to verify this case ? It's really the
only way to be sure.
>> It seems to me that resetting the IP is just as "dangerous" as setting
>> the IP to High-speed in the first place. No ?
>
> The soft-reset is just a recovery mechanism if that error is ever hit.
but you don't know if that's a *proper* recovery mechanism because you
never even *hit* the error.
> Putting the controller in reset state means it is in a known
> state. Why do you say it would be dangerous?
Because you can't predict the systems' behavior. If the flip-flop didn't
have time to settle into 0 or 1 state, you don't know what the
combinatorial part of the IP will do with that bogus value. It's truly
unpredictable. You also cannot know, for sure, that a SoftReset will be
enough to bring that flip-flop out of metastability.
> The original workaround i.e. setting the High-speed instance to
> Super-speed to avoid this errata is causing another side
> effect. i.e. erratic interrupts happen and more than 2 seconds delay
this should have been an expected side-effect when you design a
SuperSpeed controller without a SuperSpeed PHY and don't properly
terminate inputs. What you have is a floating PIPE3 interface not
properly terminated and capturing random noise (basically acting as a
very poor antenna inside your die). Of course the IP will go bonkers and
give you "erratic error" interrupts. It has no idea what the hell this
"PHY" on the PIPE3 interface is doing.
> to enumerations. This problem is more serious and so we have to keep
> the controller in High-speed and tackle the meta-stability condition
> if it happens.
you have to tackle the meta-stability, sure, but we need guarantee that
$subject IS indeed tackling that problem. Unless there's proof that this
really solves the meta-stability issue, I won't take $subject. Sorry
dude, but I don't want regressions elsewhere because of a badly
validated patch.
Bottomline, it's not enough to _state_ that it solves the problem. You
need to first *trigger* the issue without the workaround, then apply
workaround and trigger it again. Then, and only then, you can be certain
you're fixing the problem.
After that, we will look into how to make sure this has no impact to
other users.
--
balbi
Attachment:
signature.asc
Description: PGP signature