Re: [PATCH 4.13 03/28] Bluetooth: btusb: fix QCA Rome suspend/resume

From: Kai Heng Feng
Date: Wed Dec 20 2017 - 03:43:00 EST

> On 20 Dec 2017, at 7:11 AM, Brian Norris <briannorris@xxxxxxxxxxxx> wrote:
> Hi Kai,
> On Tue, Dec 19, 2017 at 12:28:17PM +0800, Kai Heng Feng wrote:
>>> On 19 Dec 2017, at 2:13 AM, Brian Norris <briannorris@xxxxxxxxxxxx> wrote:
>>> On Mon, Dec 18, 2017 at 12:43:48PM +0100, Greg Kroah-Hartman wrote:
>>>> On Fri, Dec 15, 2017 at 07:05:39PM -0800, Matthias Kaehlcke wrote:
>>>>> We identified the above patch as the culprit, in combination with USB
>>>>> autosuspend being enabled for the Bluetooth controller.
>>>>> We found that the following (recent) upstream patch fixes the issue on
>>>>> most devices (we are still investigating one case on a specific device):
>>> A key word to highlight here is "most". I have at least one device that
>>> is consistently having problems even with both patches included; the
>>> only way things work reliably so far is to simply revert the $subject
>>> patch in 4.4.x.
>> The problem we have is that the QCA Rome Bluetooth stops working
>> after system S3. The reset resume quirk workaround the issue on both
>> mainline and 4.4.x (at least for me).
> Understood. But that's not the case for all systems, and on some, you're
> regressing functionality.

Yes. So reverting the commit is a reasonable path we should take.

> Many systems keep power enabled for system suspend, and so that "fix" is
> not necessary for them. It's also not very precise, because it seems to
> act in many cases where it need not -- for one, it takes effect on *all*
> resume attempts, even resuming from autosuspend. I really doubt your
> system is losing USB power in S0 + USB autosuspend?

No, USB power is on. Not seeing my XHCI gets put to D3cold.

> So, if you really need this patch for some systems, it seems like it
> should be much better targeted. You're currently adding a lot of
> unnecessary overhead and fragility.

You are right. Applying this unconditionally to all QCA Rome may be

>>> I'll try to investigate further, since this result is a bit confusing
>>> still. If there's a real problem with the patch (and I suspect there
>>> might be), it probably shouldn't be in mainline eitherâ
>> Do you have the same issue on mainline?
> Refer to my note [1] :)
> But because you asked, I did the work necessary to boot mainline on this
> system, and in fact, I see the same problem. I think that's enough
> reason to revert your patch in all branches (upstream, and any -stable
> branch that already took it).
> To be clear, here's the kernel logs on 4.15-rc4+, where the 55-second
> mark is around where I try to power on and scan for BT devices (power-on
> and scan both fail):
> [ 2.323475] usb 3-1: new full-speed USB device number 2 using ohci-platform
> ...
> [ 2.731719] usb 3-1: New USB device found, idVendor=0cf3, idProduct=e300
> ...
> [ 2.867870] usb 3-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
> ...(udev doesn't run until here)...
> [ 35.733139] usbcore: registered new interface driver btusb
> [ 35.740912] Bluetooth: hci0: using rampatch file: qca/rampatch_usb_00000302.bin
> [ 35.749216] Bluetooth: hci0: QCA: patch rome 0x302 build 0x3e8, firmware rome 0x302 build 0x111
> ...
> [ 35.816104] Bluetooth: hci0: using NVM file: qca/nvm_usb_00000302.bin
> ...
> [ 55.073503] usb 3-1: reset full-speed USB device number 2 using ohci-platform
> ...
> [ 57.772513] Bluetooth: hci0: urb 00000000ec39086b failed to resubmit (113)
> [ 57.780217] Bluetooth: hci0: urb 0000000066228fda failed to resubmit (113)
> Whereas reverting the $subject patch yields no reset and URB failure;
> Bluetooth seems to work fine.

Thanks for your testing on mainline!

>>>>> commit a0085f2510e8976614ad8f766b209448b385492f
>>>>> Author: Sukumar Ghorai <sukumar.ghorai@xxxxxxxxx>
>>>>> Date: Wed Aug 16 14:46:55 2017 -0700
>>>>> Bluetooth: btusb: driver to enable the usb-wakeup feature
>>> [...]
>>>>> stable branches are currently broken for BTUSB_QCA_ROME with USB
>>>>> autosuspend enabled, since the above patch is not included (I only
>>>>> checked v4.4 and v4.9), so we probably want to integrate it.
>>>> Now queued up, thanks for letting me know.
>>> I think you have almost as much information as I do at this point, and
>>> I'll try to reply back here if I figure out anything more, so I'll leave
>>> you to your decisions. But I would personally revert, not backport more
>>> patches.
>>> Among the reasons:
>>> (a) I have at least one device that does not work better with both
>>> patches [1]
>>> (b) So far, I haven't seen any explanation why commit a0085f2510e8
>>> should be a dependency for $subject patch; the closest I can imagine
>>> would be that commit a0085f2510e8 could effectively neutralize
>>> $subject patch -- if the device is marked as a wakeup source, then
>>> it will never try to RESET on resume -- but that's still a fuzzy
>>> signal; just because it's marked a wakeup source sometimes doesn't
>>> mean it always will be. We can disable it in user space too.
>> Hi Leif,
>> Can you try if a0085f2510e8 breaks your $subject commit?
>> If itâs a yes, then what Brian suggests is correct.
>> Also, can you share the output of "usb-deviceâ for your btusb device?
> FWIW, here's mine:
> T: Bus=03 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=12 MxCh= 0
> D: Ver= 2.01 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1
> P: Vendor=0cf3 ProdID=e300 Rev=00.01
> C: #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA
> I: If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
> I: If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
> This is an integrated Wifi+BT QCA6174A-5 M.2 module.

Thanks for the info. I found the same BT on a Killer Wireless, Iâll do
some testing on it.

>>> (c) Isn't it safer to revert than to backport more? You're delving into
>>> feature-land, not bugfix-landâ
>> Sounds reasonable.
> Shall I post a proper revert for mainline, with Fixes tags and CC
> stable? Or did one of you want to do that?

Let me clean my own mess ;)

>> Weâll need more information from Leif if we need to do ID-specific quirks.
> As I see it, this probably isn't just an ID-specific thing; this patch
> looks bad for its intended purpose, per my comments above (it's
> overkill, for one). I think there are many factors involved here, and it
> needs more study than "I just booted, suspended/resumed once, and it
> worked better."
> Among the things to consider:
> * How do we determine the likelihood that the device gets powered off in
> S3? The device_may_wakeup() check can be influenced (among other
> things) by user space policy, so that doesn't really look right to me.
> (ID-based checks might help a little, since you can probably
> differentiate chips that are likely used on discrete/removable USB
> parts, rather than internal chips?)

Actually, I think we canât know for sure. From my own experience, itâs
pretty ODM specific. Some external chips keep its power under S3.

> * The ordering: right now, you're enabling "reset on resume" at probe
> time, but QCA ROME devices only load their firmware in btusb_open().
> This gives a window of time in which you're adding weird reset
> behavior while the device may be running its ROM firmware, not the
> patched firmware loaded from /lib/firmware/. That might be one reason
> things don't work out so well?

You are right, the reset on resume should be used when USB probing,
instead of in the middle of btusb_open().

> I also suspect that might be one reason that Matthias sometimes saw
> the problem disappear on his devices, where I didn't. I have my device
> configured to cold reset the BT device on every reboot, whereas his
> isn't; it might retain the FW in the BT device's RAM across reboot.
> So my behavior was more deterministic :)
> * The behavior here is affected by USB autosuspend, not just S3. I
> mentioned this above, but more specifically, ChromeOS has udev rules
> like this, to enable autosuspend on various sorts of devices
> (including bluetooth controllers + BT HID devices):
> <udev rule snippet>
> ATTR{idVendor}=="0cf3", ATTR{idProduct}=="e300", GOTO="autosuspend_enable"
> ...
> # Enable autosuspend
> LABEL="autosuspend_enable"
> TEST=="power/control", ATTR{power/control}="auto", GOTO="autosuspend_end"
> LABEL="autosuspend_end"
> </udev rule snippet>
> I suspect some desktop systems might not have the same policies, so
> you're not testing autosuspend in the same way I am?

Actually itâs same from kernelâs perspective. I use TLP, it does the same
thing to the power/control knob.


> Brian
>> Kai-Heng
>>> Brian
>>> [1] Before you ask: this particular device is not quite fully supported
>>> on upstream yet, though its sibling devices are. With a bit of effort, I
>>> might be able to test a clean upstream. At the moment, I'm using our
>>> Chrom{e,ium}OS 4.4 kernel, where we regularly merge in -stable updates.