Re: [PM] bfcc1e67ff: kernel-selftests.breakpoints.step_after_suspend_test.fail

From: Florian Fainelli
Date: Thu Oct 21 2021 - 15:19:12 EST


On 10/20/21 11:48 AM, Rafael J. Wysocki wrote:
> On Wed, Oct 20, 2021 at 8:17 PM Florian Fainelli <f.fainelli@xxxxxxxxx> wrote:
>>
>> On 10/20/21 9:00 AM, Rafael J. Wysocki wrote:
>>> On Wed, Oct 20, 2021 at 5:34 PM Florian Fainelli <f.fainelli@xxxxxxxxx> wrote:
>>>>
>>>>
>>>>
>>>> On 10/20/2021 6:49 AM, Rafael J. Wysocki wrote:
>>>>> On Tue, Oct 19, 2021 at 9:04 PM Florian Fainelli <f.fainelli@xxxxxxxxx> wrote:
>>>>>>
>>>>>> On 10/19/21 11:53 AM, Rafael J. Wysocki wrote:
>>>>>>> On 10/15/2021 9:40 PM, Florian Fainelli wrote:
>>>>>>>> On 10/15/21 11:45 AM, Rafael J. Wysocki wrote:
>>>>>>>>> On 10/14/2021 11:55 PM, Florian Fainelli wrote:
>>>>>>>>>> On 10/14/21 12:23 PM, Rafael J. Wysocki wrote:
>>>>>>>>>>> On 10/14/2021 6:26 PM, Florian Fainelli wrote:
>>>>>>>>>>>> On 10/14/21 12:57 AM, kernel test robot wrote:
>>>>>>>>>>>>> Greeting,
>>>>>>>>>>>>>
>>>>>>>>>>>>> FYI, we noticed the following commit (built with gcc-9):
>>>>>>>>>>>>>
>>>>>>>>>>>>> commit: bfcc1e67ff1e4aa8bfe2ca57f99390fc284c799d ("PM: sleep: Do not
>>>>>>>>>>>>> assume that "mem" is always present")
>>>>>>>>>>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git
>>>>>>>>>>>>> master
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> in testcase: kernel-selftests
>>>>>>>>>>>>> version: kernel-selftests-x86_64-c8c9111a-1_20210929
>>>>>>>>>>>>> with following parameters:
>>>>>>>>>>>>>
>>>>>>>>>>>>> group: group-00
>>>>>>>>>>>>> ucode: 0x11
>>>>>>>>>>>>>
>>>>>>>>>>>>> test-description: The kernel contains a set of "self tests" under
>>>>>>>>>>>>> the
>>>>>>>>>>>>> tools/testing/selftests/ directory. These are intended to be small
>>>>>>>>>>>>> unit tests to exercise individual code paths in the kernel.
>>>>>>>>>>>>> test-url: https://www.kernel.org/doc/Documentation/kselftest.txt
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> on test machine: 288 threads 2 sockets Intel(R) Xeon Phi(TM) CPU
>>>>>>>>>>>>> 7295
>>>>>>>>>>>>> @ 1.50GHz with 80G memory
>>>>>>>>>>>>>
>>>>>>>>>>>>> caused below changes (please refer to attached dmesg/kmsg for entire
>>>>>>>>>>>>> log/backtrace):
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> If you fix the issue, kindly add following tag
>>>>>>>>>>>>> Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
>>>>>>>>>>>> Thanks for your report. Assuming that the code responsible for
>>>>>>>>>>>> registering the suspend operations is drivers/acpi/sleep.c for your
>>>>>>>>>>>> platform, and that acpi_sleep_suspend_setup() iterated over all
>>>>>>>>>>>> possible
>>>>>>>>>>>> sleep states, your platform must somehow be returning that
>>>>>>>>>>>> ACPI_STATE_S3
>>>>>>>>>>>> is not a supported state somehow?
>>>>>>>>>>>>
>>>>>>>>>>>> Rafael have you ever encountered something like that?
>>>>>>>>>>> Yes, there are systems with ACPI that don't support S3.
>>>>>>>>>> OK and do you know what happens when we enter suspend with "mem" in
>>>>>>>>>> those cases? Do we immediately return because ultimately the firmware
>>>>>>>>>> does not support ACPI S3?
>>>>>>>>> "mem" should not be present in the list of available strings then, so it
>>>>>>>>> should be rejected right away.
>>>>>>>> Well yes, that was the purpose of the patch I submitted, but assuming
>>>>>>>> that we did provide "mem" as one of the possible standby modes even
>>>>>>>> though that was wrong (before patch), and the test was trying to enter
>>>>>>>> ACPI S3 standby, what would have happened, would the ACPI firmware honor
>>>>>>>> the request but return an error, or would it actually enter ACPI S3?
>>>>>>>>
>>>>>>>> In any case, I will change the test to check that this is a supported
>>>>>>>> standby mode before trying it.
>>>>>>>
>>>>>>> Unfortunately, I will need to revert bfcc1e67ff1e4aa8bfe2, because it
>>>>>>> breaks user space compatibility and that's got caught properly by the test.
>>>>>>
>>>>>> Reverting my commit will break powerpc and other ARM/ARM64 platforms
>>>>>> where mem is not supported (via PSCI),
>>>>>
>>>>> It won't break anything, although the things that didn't work before
>>>>> will still not work after it.
>>>>>
>>>>> And "mem" is always supported even if there are no suspend_ops at all,
>>>>> in which case it becomes an alternative way to trigger s2idle.
>>>>>
>>>>> So, on the affected systems, what's there in /sys/power/? Is
>>>>> mem_sleep present? If so, what's in it?
>>>>
>>>> With 4.9 which is what I used initially:
>>>>
>>>> # cat /sys/power/state
>>>> freeze standby
>>>> # cat /sys/power/
>>>> pm_async pm_print_times pm_wakeup_irq wakeup_count
>>>> pm_freeze_timeout pm_test state
>>>>
>>>> With a newer kernel without my patch:
>>>>
>>>> # cat /sys/power/state
>>>> freeze standby mem
>>>> # cat /sys/power/mem_sleep
>>>> s2idle shallow [deep]
>>>
>>> OK, so the "deep" and "shallow" suspend variants appear to be
>>> supported. What's the problem with advertising "mem" then?
>>
>> s2idle and shallow are, but deep is not.
>
> Why is it there in mem_sleep, then? It should not be there if
> valid_state() returns 'false' for it.

The suspend_ops that is registered has a ->valid that will return false
for the PM_SUSPEND_MEM case, yet we ignore that and we still populate
valide_state[PM]

>
> mem_sleep_states[PM_SUSPEND_MEM] is only set by suspend_set_ops() if
> valid_state(PM_SUSPEND_MEM) is 'true'.

That is true, but pm_states[PM_SUSPEND_MEM] was (before my patch that
is) unconditionally present. And for the same reason that you expect
user-space to find the string "mem" in /sys/power/state, we expected not
to find it, if PM_SUSPEND_MEM is not supported.

>
>>>
>>>> # cat /sys/power/
>>>> mem_sleep pm_freeze_timeout pm_wakeup_irq wakeup_count
>>>> pm_async pm_print_times state
>>>> pm_debug_messages pm_test suspend_stats/
>>>>
>>>>
>>>>>
>>>>>> I have a change pending for PSCI
>>>>>> that will actually check that SYSTEM_SUSPEND is supported before
>>>>>> unconditionally making use of it.
>>>>>>
>>>>>>>
>>>>>>> What happens is that "mem" is a "pointer" to a secondary list of
>>>>>>> possible states and that generally is "s2idle shallow deep" and if
>>>>>>> s2idle is the only available option, it will be just "s2idle".
>>>>>>>
>>>>>>> This list is there in /sys/power/mem_sleep.
>>>>>>>
>>>>>>> It was done this way, because some variants of user space expect "mem"
>>>>>>> to be always present and don't recognize "freeze" properly.
>>>>>>>
>>>>>>> Sorry for the confusion.
>>>>>>
>>>>>> So how do we all get our cookie here? Should we just slap an #ifndef
>>>>>> CONFIG_ACPI in order to allow platforms that do not have "mem" to not
>>>>>> have it?
>>>>>
>>>>> Certainly not.
>>>>>
>>>>> I've just hacked my test-bed system with ACPI so it does not register
>>>>> any suspend_ops at all and I have "freeze mem disk" in
>>>>> /sys/power/state and "s2idle" in /sys/power/mem_sleep. Writing "mem"
>>>>> to /sys/power/state causes s2idle to be carried out.
>>>>>
>>>>> Since this is the expected behavior, I'm not sure what the problem is.
>>>>
>>>> The problem is advertising "mem" in /sys/power/state when the state is
>>>> not actually supported by the platform firmware here, whether that
>>>> translates into the form of s2idle or not. It is not supported, and it
>>>> should not be there IMHO.
>>>
>>> Well, it is there, because some user space expects it to be there on
>>> systems supporting any kind of system-wide suspend, including s2idle.
>>> Like it or not.
>>
>> But that was not the case before 406e79385f32 ("PM / sleep: System sleep
>> state selection interface rework") and clearly nobody complained about
>> that, did they?
>
> Yes, it was and yes, they did. Changes like that are not made without a reason.
>
>>>
>>> If it is not there, the utilities in question assume that system-wide
>>> suspend is not supported at all.
>>
>> What utilities do depend on that? That selftest that does not even check
>> that "mem" is actually present in /sys/power/state and just fails its
>> test if it is not, yes it's not great, but that can be fixed.
>
> Various GUI-based things like KDE, GNOME and similar plus the Chrome
> user space IIRC.

OK.

>
>>>
>>>> I was late to the game in identifying that,
>>>> but the 4.9 kernel makes sense to me.
>>>>
>>>> Similarly, if you take arch/powerpc/sysdev/fsl_pmc.c only
>>>> PM_SUSPEND_STANDBY is valid, so advertising mem would be wrong if we
>>>> don't look at what ->valid tells us.
>>>
>>> Again: "mem" appears in /sys/power/state if the system supports any
>>> kind of system-wide suspend (because of the expectations of user space
>>> mentioned above) and mem_sleep decides what it really means.
>>>
>>> And this is documented too (see Documentation/admin-guide/pm/sleep-states.html).
>>
>> The documentation just states that if the kernel supports *any* suspend
>> state, then /sys/power/state will be present and likewise for
>> /sys/power/mem_sleep, it does not say what the contents will be and that
>> "mem" would always be present in there.
>
> It doesn't say so directly, but it kind of wouldn't make sense to have
> "mem_sleep" without "mem" in "state" and it implies that "mem_sleep"
> is not empty if it is present. Ergo "mem" is present in "state" if
> "mem_sleep" is present which is the case if (at least) s2idle is
> supported. That is always the case if CONFIG_SUSPEND is set which
> follows from the suspend-to-idle description.
>
> Anyway, I'm still not sure what the problem really is. Commit
> 406e79385f32 still allows user space to only trigger transitions to
> s2idle and other states explicitly reported as valid by the platform.
>

The problem from my perspective is still that "mem" is present even with
PM_SUSPEND_PM not being valid for the said platform, and this is just
confusing my/our user-space here as well as our users. This was not like
that back in the 4.9 kernel, but it changed later, therefore it also
constitutes an user-space regression from my angle.
--
Florian