Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

From: Marc Zyngier
Date: Tue Aug 08 2017 - 03:39:43 EST


On 08/08/17 02:30, Masahiro Yamada wrote:
> Hi Marc,
>
> 2017-08-07 17:17 GMT+09:00 Marc Zyngier <marc.zyngier@xxxxxxx>:
>> On 07/08/17 05:45, Masahiro Yamada wrote:
>>> Hi Marc,
>>>
>>>
>>> 2017-08-03 22:30 GMT+09:00 Marc Zyngier <marc.zyngier@xxxxxxx>:
>>>> On 03/08/17 13:52, Masahiro Yamada wrote:
>>>>> Hi Marc,
>>>>>
>>>>> 2017-08-03 17:41 GMT+09:00 Marc Zyngier <marc.zyngier@xxxxxxx>:
>>>>>> Hi Masahiro,
>>>>>>
>>>>>> On 03/08/17 08:32, Masahiro Yamada wrote:
>>>>>>> Hi.
>>>>>>>
>>>>>>> 2017-08-01 0:55 GMT+09:00 Thomas Gleixner <tglx@xxxxxxxxxxxxx>:
>>>>>>>> On Mon, 31 Jul 2017, Tomi Sarvela wrote:
>>>>>>>>> On 31/07/17 18:06, Thomas Gleixner wrote:
>>>>>>>>>> Can you please remove the patch. And try the following:
>>>>>>>>>>
>>>>>>>>>> # echo N > /sys/module/printk/parameters/console_suspend
>>>>>>>>>>
>>>>>>>>>> # echo mem > /sys/power/state
>>>>>>>>>>
>>>>>>>>>> and log the output of the serial console. That way we might get a clue
>>>>>>>>>> where it gets stuck.
>>>>>>>>>
>>>>>>>>> I'm afraid it hangs right away. No response from SSH, no output to serial.
>>>>>>>>
>>>>>>>> What means hangs right away? Is there no output at all on the serial
>>>>>>>> console? Or does it just stop at some point?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> tglx
>>>>>>>>
>>>>>>>
>>>>>>> Sorry for jumping in.
>>>>>>> Finally, I found this thread.
>>>>>>>
>>>>>>>
>>>>>>> My environment is completely different (ARM64 board),
>>>>>>> I am also suffering from a hibernation problem
>>>>>>> since this commit.
>>>>>>>
>>>>>>>
>>>>>>> I get no response on the serial console
>>>>>>> after "Restarting tasks ... done." log message.
>>>>>>>
>>>>>>>
>>>>>>> By reverting bf22ff45bed6 ("genirq: Avoid unnecessary low level
>>>>>>> irq function calls", I can get hibernation working again.
>>>>>>>
>>>>>>>
>>>>>>> SW info:
>>>>>>> defconfig: arch/arm64/configs/defconfig
>>>>>>> DT : arch/arm64/boot/dts/socionext/uniphier-ld20-ref.dts
>>>>>>> PSCI : ARM Trusted Firmware
>>>>>>>
>>>>>>>
>>>>>>> SoC info:
>>>>>>> CPU : Cortex-A72 * 2 + Cortex-A53 * 2
>>>>>>> irqchip : GICv3 (drivers/irq/irq-gic-v3.c)
>>>>>>
>>>>>> Let me take an educated guess: It feels like your firmware doesn't
>>>>>> save/restore the GIC context across suspend/resume. Is that something
>>>>>> you could check, assuming you have access to the firmware source code?
>>>>>
>>>>> Thanks for your comments.
>>>>>
>>>>>
>>>>> I do not know much about the manner of preserving GICv3 context.
>>>>>
>>>>> I can see this patch (rejected?) :
>>>>> https://patchwork.kernel.org/patch/9343061/
>>>>>
>>>>>
>>>>> Is it something that should be completely cared by firmware
>>>>> instead of kernel?
>>>>
>>>> That was definitely the intention, but it looks like something that ATF
>>>> has only started supporting very recently:
>>>>
>>>> https://github.com/ARM-software/arm-trusted-firmware/pull/1047
>>>>
>>>>> ARM Trusted Firmware (https://github.com/ARM-software/arm-trusted-firmware)
>>>>> is open source software, and I pushed my platform code to the upstream.
>>>>>
>>>>> So, yes, I (and everybody) can have access to the firmware source code.
>>>>>
>>>>>
>>>>> I am not sure how ATF saves the context during hibernation, though.
>>>>
>>>> See the above link. Is there any chance of you trying this into your
>>>> firmware?
>>>>
>>>> Thanks,
>>>
>>> Thanks for the pointer.
>>>
>>>
>>> Yes. I will try that once GIC-v3 context save/restore is supported in ATF.
>>>
>>> I think that will basically work for suspend-to-ram
>>> because all contexts including both non-secure and secure worlds will
>>> be retained in the main memory.
>>>
>>> However, I still do not understand how the context is preserved during
>>> the hibernation (suspend-to-disk).
>>>
>>>
>>> If my understanding is correct, hibernation on Linux works like follows:
>>>
>>> [1] Freeze all tasks
>>> [2] CPU_OFF for non-boot CPUs
>>> [3] Create a hibernation image
>>> [4] CPU_ON for non-boot CPUs
>>> [5] Write the hibernation image to the disk (=swap area)
>>> [6] SYSTEM_OFF
>>>
>>>
>>> IIUC, [5] only writes the context Linux takes care of (only non-secure).
>>>
>>> If so, where and how does the firmware write the GIC-v3 context
>>> to the disk?
>>
>> Gah, I completely missed the fact that you were talking about suspend to
>> disk, sorry about that.
>>
>> It is likely that some driver doesn't restore its state properly. Is
>> there any chance that you could pinpoint which device creates the issue?
>>
>
> I use eMMC to store the hibernation image, but
> I do not think eMMC driver is the cause of the issue.
>
> I guess the cause of the issue is GIC-v3 context is lost.

It is not lost. The boot kernel has re-initialized its state. What is
missing is that one driver in your system fails to restore its own state
correctly, and relies on doing something such as enabling/disabling the
interrupt in its PM handler, things will start working again (probably
because they use the same PM callbacks functions for both suspend/resume
and hibernation). This is in no way a guarantee.

Please try:

> swapon -a
> echo test_resume > /sys/power/disk
> echo disk > /sys/power/state

and let me know how this fares.

> I am not an expert in this, so I will ask the ATF community
> about how ATF can support suspend-to-disk.

As you pointed out, ARF is not involved at all in that context, so
that's pretty pointless.

Thanks,

M.
--
Jazz is not dead. It just smells funny...