Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

From: Marc Zyngier
Date: Mon Aug 07 2017 - 04:17:44 EST


On 07/08/17 05:45, Masahiro Yamada wrote:
> Hi Marc,
>
>
> 2017-08-03 22:30 GMT+09:00 Marc Zyngier <marc.zyngier@xxxxxxx>:
>> On 03/08/17 13:52, Masahiro Yamada wrote:
>>> Hi Marc,
>>>
>>> 2017-08-03 17:41 GMT+09:00 Marc Zyngier <marc.zyngier@xxxxxxx>:
>>>> Hi Masahiro,
>>>>
>>>> On 03/08/17 08:32, Masahiro Yamada wrote:
>>>>> Hi.
>>>>>
>>>>> 2017-08-01 0:55 GMT+09:00 Thomas Gleixner <tglx@xxxxxxxxxxxxx>:
>>>>>> On Mon, 31 Jul 2017, Tomi Sarvela wrote:
>>>>>>> On 31/07/17 18:06, Thomas Gleixner wrote:
>>>>>>>> Can you please remove the patch. And try the following:
>>>>>>>>
>>>>>>>> # echo N > /sys/module/printk/parameters/console_suspend
>>>>>>>>
>>>>>>>> # echo mem > /sys/power/state
>>>>>>>>
>>>>>>>> and log the output of the serial console. That way we might get a clue
>>>>>>>> where it gets stuck.
>>>>>>>
>>>>>>> I'm afraid it hangs right away. No response from SSH, no output to serial.
>>>>>>
>>>>>> What means hangs right away? Is there no output at all on the serial
>>>>>> console? Or does it just stop at some point?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> tglx
>>>>>>
>>>>>
>>>>> Sorry for jumping in.
>>>>> Finally, I found this thread.
>>>>>
>>>>>
>>>>> My environment is completely different (ARM64 board),
>>>>> I am also suffering from a hibernation problem
>>>>> since this commit.
>>>>>
>>>>>
>>>>> I get no response on the serial console
>>>>> after "Restarting tasks ... done." log message.
>>>>>
>>>>>
>>>>> By reverting bf22ff45bed6 ("genirq: Avoid unnecessary low level
>>>>> irq function calls", I can get hibernation working again.
>>>>>
>>>>>
>>>>> SW info:
>>>>> defconfig: arch/arm64/configs/defconfig
>>>>> DT : arch/arm64/boot/dts/socionext/uniphier-ld20-ref.dts
>>>>> PSCI : ARM Trusted Firmware
>>>>>
>>>>>
>>>>> SoC info:
>>>>> CPU : Cortex-A72 * 2 + Cortex-A53 * 2
>>>>> irqchip : GICv3 (drivers/irq/irq-gic-v3.c)
>>>>
>>>> Let me take an educated guess: It feels like your firmware doesn't
>>>> save/restore the GIC context across suspend/resume. Is that something
>>>> you could check, assuming you have access to the firmware source code?
>>>
>>> Thanks for your comments.
>>>
>>>
>>> I do not know much about the manner of preserving GICv3 context.
>>>
>>> I can see this patch (rejected?) :
>>> https://patchwork.kernel.org/patch/9343061/
>>>
>>>
>>> Is it something that should be completely cared by firmware
>>> instead of kernel?
>>
>> That was definitely the intention, but it looks like something that ATF
>> has only started supporting very recently:
>>
>> https://github.com/ARM-software/arm-trusted-firmware/pull/1047
>>
>>> ARM Trusted Firmware (https://github.com/ARM-software/arm-trusted-firmware)
>>> is open source software, and I pushed my platform code to the upstream.
>>>
>>> So, yes, I (and everybody) can have access to the firmware source code.
>>>
>>>
>>> I am not sure how ATF saves the context during hibernation, though.
>>
>> See the above link. Is there any chance of you trying this into your
>> firmware?
>>
>> Thanks,
>
> Thanks for the pointer.
>
>
> Yes. I will try that once GIC-v3 context save/restore is supported in ATF.
>
> I think that will basically work for suspend-to-ram
> because all contexts including both non-secure and secure worlds will
> be retained in the main memory.
>
> However, I still do not understand how the context is preserved during
> the hibernation (suspend-to-disk).
>
>
> If my understanding is correct, hibernation on Linux works like follows:
>
> [1] Freeze all tasks
> [2] CPU_OFF for non-boot CPUs
> [3] Create a hibernation image
> [4] CPU_ON for non-boot CPUs
> [5] Write the hibernation image to the disk (=swap area)
> [6] SYSTEM_OFF
>
>
> IIUC, [5] only writes the context Linux takes care of (only non-secure).
>
> If so, where and how does the firmware write the GIC-v3 context
> to the disk?

Gah, I completely missed the fact that you were talking about suspend to
disk, sorry about that.

It is likely that some driver doesn't restore its state properly. Is
there any chance that you could pinpoint which device creates the issue?

Thanks,

M.
--
Jazz is not dead. It just smells funny...