Re: Regression with PM / wakeup: Show wakeup sources stats in sysfs"

From: Florian Fainelli
Date: Mon Jun 01 2020 - 21:30:34 EST




On 5/30/2020 3:33 PM, Tri Vo wrote:
> On Sat, May 30, 2020 at 11:52 AM Florian Fainelli <f.fainelli@xxxxxxxxx> wrote:
>>
>>
>>
>> On 5/29/2020 4:14 PM, Tri Vo wrote:
>>> On Fri, May 29, 2020 at 3:37 PM Florian Fainelli <f.fainelli@xxxxxxxxx> wrote:
>>>>
>>>> On 5/29/20 3:28 PM, Tri Vo wrote:
>>>>> On Fri, May 29, 2020 at 9:51 AM Rafael J. Wysocki
>>>>> <rafael.j.wysocki@xxxxxxxxx> wrote:
>>>>>>
>>>>>> On 5/28/2020 10:46 PM, Florian Fainelli wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Commit c8377adfa78103be5380200eb9dab764d7ca890e ("PM / wakeup: Show
>>>>>>> wakeup sources stats in sysfs") is causing some of our tests to fail
>>>>>>> because /sys/class/net/*/device/power/wakeup_count is now 0, despite
>>>>>>> /sys/kernel/debug/wakeup_sources clearly indicating that the Ethernet
>>>>>>> device was responsible for system wake-up.
>>>>>>>
>>>>>>> What's more in looking at /sys/class/wakekup/wakeup2/event_count, we
>>>>>>> have the number of Wake-on-LAN wakeups recorded properly, but
>>>>>>> wakeup_count is desperately 0, why is that?
>>>>>>
>>>>>> I need to look at that commit in detail to find out what is going on.
>>>>>
>>>>> It would be helpful to see the contents of
>>>>> /sys/kernel/debug/wakeup_sources, /sys/class/net/*/device/power/*, and
>>>>> /sys/class/wakekup/* corresponding to the device in question. The
>>>>> values in these files are queried from the same struct wakeup_source.
>>>>> So it's odd if wakeup_count diverges.
>>>>
>>>> Most certainly, below is the information you want, the two cat
>>>> /s/k/d/wakeup_sources were done before Wake-on-LAN and after waking-up
>>>> from LAN. /sys/class/wakeup/wakeup2 maps to the Ethernet device.
>>>>
>>>> The Ethernet device calls pm_wakeup_event() against the struct device
>>>> that is embedded in the platform_device that it was probed with. I will
>>>> try to debug this myself over the weekend, time permitting.
>>>>
>>>>
>>>> # ethtool -s eth0 wol g
>>>> # cat /sys/kernel/debug/wakeup_sources
>>>> name active_count event_count wakeup_count
>>>> expire_count active_since total_time max_time last_changep
>>>> revent_suspend_time
>>>> 47d580000.ethernet 0 0 0
>>>> 0 0 0 0 0 0
>>>> alarmtimer 0 0 0 0
>>>> 0 0 0 0 0
>>>> 47c408400.waketimer 2 2 0
>>>> 0 0 0 0 6144
>>>> 1 0
>>>> # pml -w20
>>>> [ 3449.937142] brcm-waketimer 47c408400.waketimer: Using sysfs
>>>> attributes, consider using 'rtcwake'
>>>> Pass 1 out of 1, mode=none, tp_al[ 3449.952654] PM: suspend entry (shallow)
>>>> l=1, cycle_tp=, sleep=, [ 3449.959004] Filesystems sync: 0.000 seconds
>>>> wakeup_time=20
>>>> [ 3449.965984] Freezing user space processes ... (elapsed 0.001 seconds)
>>>> done.
>>>> [ 3449.974087] OOM killer disabled.
>>>> [ 3449.977316] Freezing remaining freezable tasks ... (elapsed 0.006
>>>> seconds) done.
>>>> [ 3449.991114] printk: Suspending console(s) (use no_console_suspend to
>>>> debug)
>>>> AMS: System is entering S2...
>>>> [ 3450.022381] bcmgenet 47d580000.ethernet eth0: Link is Down
>>>> [ 3450.048340] Disabling non-boot CPUs ...
>>>> [ 3450.049344] CPU1: shutdown
>>>> [ 3450.050393] psci: CPU1 killed (polled 1 ms)
>>>> [ 3450.051332] Enabling non-boot CPUs ...
>>>> [ 3450.051712] Detected PIPT I-cache on CPU1
>>>> [ 3450.051812] CPU1: Booted secondary processor 0x0000000001 [0x410fd083]
>>>> [ 3450.052435] CPU1 is up
>>>> [ 3450.683588] bcmgenet 47d580000.ethernet eth0: Link is Up - 1Gbps/Full
>>>> - flow control rx/tx
>>>> [ 3450.729677] OOM killer enabled.
>>>> [ 3450.732908] Restarting tasks ... done.
>>>> [ 3450.738539] PM: suspend exit
>>>> ------------------------------
>>>> [ 3450.744239] brcm-waketimer 47c408400.waketimer: Using sysfs
>>>> attributes, consider using 'rtcwake'
>>>> # cat /sys/kernel/debug/wakeup_sources
>>>> name active_count event_count wakeup_count
>>>> expire_count active_since total_time max_time last_changep
>>>> revent_suspend_time
>>>> 47d580000.ethernet 1 1 0
>>>> 0 0 0 0 3450
>>>> 054 0
>>>> alarmtimer 0 0 0 0
>>>> 0 0 0 0 0
>>>> 47c408400.waketimer 2 2 0
>>>> 0 0 0 0 6144
>>>> 1 0
>>>> # cat /sys/class/net/*/device/power/*
>>>> cat: read error: Input/output error
>>>> auto
>>>> 0
>>>> unsupported
>>>> 0
>>>> enabled
>>>> 0
>>>> 0
>>>> 1
>>>> 0
>>>> 0
>>>> 3450054
>>>> 0
>>>> 0
>>>
>>> UUIC, 47d580000.ethernet is the device of interest here. It's
>>> wakeup_count was 0 before wake up, and we expect it to be 1 after wake
>>> up. One of the files you cat'ed here has a 1 in it. I can't tell which
>>> value corresponds to which file though, but I suspect that's
>>> wakeup_count.
>>
>> That file is actually event_count which is not what is expected:
>>
>> # sh print.sh
>> /sys/class/wakeup/wakeup2/uevent:
>> /sys/class/wakeup/wakeup2/event_count: 1
>> /sys/class/wakeup/wakeup2/max_time_ms: 0
>> /sys/class/wakeup/wakeup2/wakeup_count: 0
>> /sys/class/wakeup/wakeup2/total_time_ms: 0
>> /sys/class/wakeup/wakeup2/expire_count: 0
>> /sys/class/wakeup/wakeup2/active_count: 1
>> /sys/class/wakeup/wakeup2/last_change_ms: 3450054
>> /sys/class/wakeup/wakeup2/prevent_suspend_time_ms: 0
>> /sys/class/wakeup/wakeup2/name: 47d580000.ethernet
>> /sys/class/wakeup/wakeup2/active_time_ms: 0
>
> Thanks! Although 0 is not the expected wakeup_count, both
> /sys/kernel/debug/wakeup_sources and /sys/class/wakeup/* are reporting
> the same thing. So it's probably not an issue with how these values
> are reported. The underlying struct wakeup_source has 0 wakeup_count
> recorded.
>
> How sure are you that commit c8377adfa78103be5380200eb9dab764d7ca890e
> introduces the regression? (That commit adds sysfs attributes to
> display wakeup source information, so it seems unlikely that actual
> wakeup event accounting is affected by it.)

Not anymore, it looks like my automated bisection was flawed, I will
restart it now and find out the offending commit when this started to
break. Thanks for reviewing the logs!
--
Florian