Re: [Regression] PCI / PM: Simplify device wakeup settings code

From: Rafael J. Wysocki
Date: Tue May 01 2018 - 04:34:40 EST


On Mon, Apr 30, 2018 at 4:22 PM, Joseph Salisbury
<joseph.salisbury@xxxxxxxxxxxxx> wrote:
> On 04/16/2018 11:58 AM, Rafael J. Wysocki wrote:
>> On Mon, Apr 16, 2018 at 5:31 PM, Joseph Salisbury
>> <joseph.salisbury@xxxxxxxxxxxxx> wrote:
>>> On 04/13/2018 05:34 PM, Rafael J. Wysocki wrote:
>>>> On Fri, Apr 13, 2018 at 7:56 PM, Joseph Salisbury
>>>> <joseph.salisbury@xxxxxxxxxxxxx> wrote:
>>>>> Hi Rafael,
>>>>>
>>>>> A kernel bug report was opened against Ubuntu [0]. After a kernel
>>>>> bisect, it was found that reverting the following two commits resolved
>>>>> this bug:
>>>>>
>>>>> 0ce3fcaff929 ("PCI / PM: Restore PME Enable after config space restoration")
>>>>> 0847684cfc5f("PCI / PM: Simplify device wakeup settings code")
>>>>>
>>>>> This is a regression introduced in v4.13-rc1 and still exists in
>>>>> mainline. The bug causes the battery to drain when the system is
>>>>> powered down and unplugged, which does not happed prior to these two
>>>>> commits.
>>>> What system and what do you mean by "powered down"? How much time
>>>> does it take for the battery to drain now?
>>> By powered down, the bug reporter is saying physically powered off and
>>> unplugged. The system is a HP laptop:
>>>
>>> dmi.chassis.vendor: HP
>>> dmi.product.family: 103C_5335KV HP Notebook
>>> dmi.product.name: HP Notebook
>>> vendor_id : GenuineIntel
>>> cpu family : 6
>>>
>>>
>>>>> The bisect actually pointed to commit de3ef1e, but reverting
>>>>> these two commits fixes the issue.
>>>>>
>>>>> I was hoping to get your feedback, since you are the patch author. Do
>>>>> you think gathering any additional data will help diagnose this issue,
>>>>> or would it be best to submit a revert request?
>>>> First, reverting these is not an option or you will break systems
>>>> relying on them now. 4.13 is three releases back at this point.
>>>>
>>>> Second, your issue appears to be related to the suspend/shutdown path
>>>> whereas commit 0ce3fcaff929 is mostly about resume, so presumably the
>>>> change in pci_enable_wake() causes the problem to happen. Can you try
>>>> to revert this one alone and see if that helps?
>>> A test kernel with commits 0ce3fcaff929 and de3ef1eb1cd0 reverted was
>>> tested. However, the test kernel still exhibited the bug.
>> So essentially the bisection result cannot be trusted.
>
> We performed some more testing and confirmed just a revert of the
> following commit resolves the bug:
>
> 0847684cfc5f0 ("PCI / PM: Simplify device wakeup settings code")

Thanks for confirming this!

> Can you think of any suggestions to help debug further?

The root cause of the regression is likely the change in
pci_enable_wake() removing the device_may_wakeup() check from it.

Probably, one of the drivers in the platform calls pci_enable_wake()
directly from its ->shutdown() callback and that causes the device to
be set up for system wakeup which in turn causes the power draw while
the system is off to increase.

I would look at the PCI drivers used on that platform to find which of
them call pci_enable_wake() directly from ->shutdown() and I would
make these calls conditional on device_may_wakeup().