Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

From: Masahiro Yamada
Date: Mon Aug 07 2017 - 21:30:44 EST


Hi Marc,

2017-08-07 17:17 GMT+09:00 Marc Zyngier <marc.zyngier@xxxxxxx>:
> On 07/08/17 05:45, Masahiro Yamada wrote:
>> Hi Marc,
>>
>>
>> 2017-08-03 22:30 GMT+09:00 Marc Zyngier <marc.zyngier@xxxxxxx>:
>>> On 03/08/17 13:52, Masahiro Yamada wrote:
>>>> Hi Marc,
>>>>
>>>> 2017-08-03 17:41 GMT+09:00 Marc Zyngier <marc.zyngier@xxxxxxx>:
>>>>> Hi Masahiro,
>>>>>
>>>>> On 03/08/17 08:32, Masahiro Yamada wrote:
>>>>>> Hi.
>>>>>>
>>>>>> 2017-08-01 0:55 GMT+09:00 Thomas Gleixner <tglx@xxxxxxxxxxxxx>:
>>>>>>> On Mon, 31 Jul 2017, Tomi Sarvela wrote:
>>>>>>>> On 31/07/17 18:06, Thomas Gleixner wrote:
>>>>>>>>> Can you please remove the patch. And try the following:
>>>>>>>>>
>>>>>>>>> # echo N > /sys/module/printk/parameters/console_suspend
>>>>>>>>>
>>>>>>>>> # echo mem > /sys/power/state
>>>>>>>>>
>>>>>>>>> and log the output of the serial console. That way we might get a clue
>>>>>>>>> where it gets stuck.
>>>>>>>>
>>>>>>>> I'm afraid it hangs right away. No response from SSH, no output to serial.
>>>>>>>
>>>>>>> What means hangs right away? Is there no output at all on the serial
>>>>>>> console? Or does it just stop at some point?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> tglx
>>>>>>>
>>>>>>
>>>>>> Sorry for jumping in.
>>>>>> Finally, I found this thread.
>>>>>>
>>>>>>
>>>>>> My environment is completely different (ARM64 board),
>>>>>> I am also suffering from a hibernation problem
>>>>>> since this commit.
>>>>>>
>>>>>>
>>>>>> I get no response on the serial console
>>>>>> after "Restarting tasks ... done." log message.
>>>>>>
>>>>>>
>>>>>> By reverting bf22ff45bed6 ("genirq: Avoid unnecessary low level
>>>>>> irq function calls", I can get hibernation working again.
>>>>>>
>>>>>>
>>>>>> SW info:
>>>>>> defconfig: arch/arm64/configs/defconfig
>>>>>> DT : arch/arm64/boot/dts/socionext/uniphier-ld20-ref.dts
>>>>>> PSCI : ARM Trusted Firmware
>>>>>>
>>>>>>
>>>>>> SoC info:
>>>>>> CPU : Cortex-A72 * 2 + Cortex-A53 * 2
>>>>>> irqchip : GICv3 (drivers/irq/irq-gic-v3.c)
>>>>>
>>>>> Let me take an educated guess: It feels like your firmware doesn't
>>>>> save/restore the GIC context across suspend/resume. Is that something
>>>>> you could check, assuming you have access to the firmware source code?
>>>>
>>>> Thanks for your comments.
>>>>
>>>>
>>>> I do not know much about the manner of preserving GICv3 context.
>>>>
>>>> I can see this patch (rejected?) :
>>>> https://patchwork.kernel.org/patch/9343061/
>>>>
>>>>
>>>> Is it something that should be completely cared by firmware
>>>> instead of kernel?
>>>
>>> That was definitely the intention, but it looks like something that ATF
>>> has only started supporting very recently:
>>>
>>> https://github.com/ARM-software/arm-trusted-firmware/pull/1047
>>>
>>>> ARM Trusted Firmware (https://github.com/ARM-software/arm-trusted-firmware)
>>>> is open source software, and I pushed my platform code to the upstream.
>>>>
>>>> So, yes, I (and everybody) can have access to the firmware source code.
>>>>
>>>>
>>>> I am not sure how ATF saves the context during hibernation, though.
>>>
>>> See the above link. Is there any chance of you trying this into your
>>> firmware?
>>>
>>> Thanks,
>>
>> Thanks for the pointer.
>>
>>
>> Yes. I will try that once GIC-v3 context save/restore is supported in ATF.
>>
>> I think that will basically work for suspend-to-ram
>> because all contexts including both non-secure and secure worlds will
>> be retained in the main memory.
>>
>> However, I still do not understand how the context is preserved during
>> the hibernation (suspend-to-disk).
>>
>>
>> If my understanding is correct, hibernation on Linux works like follows:
>>
>> [1] Freeze all tasks
>> [2] CPU_OFF for non-boot CPUs
>> [3] Create a hibernation image
>> [4] CPU_ON for non-boot CPUs
>> [5] Write the hibernation image to the disk (=swap area)
>> [6] SYSTEM_OFF
>>
>>
>> IIUC, [5] only writes the context Linux takes care of (only non-secure).
>>
>> If so, where and how does the firmware write the GIC-v3 context
>> to the disk?
>
> Gah, I completely missed the fact that you were talking about suspend to
> disk, sorry about that.
>
> It is likely that some driver doesn't restore its state properly. Is
> there any chance that you could pinpoint which device creates the issue?
>

I use eMMC to store the hibernation image, but
I do not think eMMC driver is the cause of the issue.

I guess the cause of the issue is GIC-v3 context is lost.


I am not an expert in this, so I will ask the ATF community
about how ATF can support suspend-to-disk.


--
Best Regards
Masahiro Yamada