Re: Bug report: kernel paniced when system hibernates

From: Conor Dooley
Date: Thu May 25 2023 - 14:22:26 EST


Hey Atish,

On Thu, May 25, 2023 at 10:39:44AM -0700, Atish Patra wrote:
> > How about the below?

> Instead of disabling hibernate support why not revert the patch
> 3335068 ("riscv: Use PUD/P4D/PGD pages for the linear mapping")
> which doesn't add any "measured" value at this point.
> However, keeping the hibernation feature on and disabling linear
> mapping will get more testing on hibernation.
> While disabling hibernation and keeping the above patch which
> doesn't have any value at all.
>
> We don't have a regression at this point. So either approach will work though.

I favoured this approach so that we do not release a kernel in which
hibernate works for these versions of OpenSBI and then stops working in
the future when we shore up how communicating this is supposed to work.
It allows us to fix the problem "properly" in slow-time, instead of
racing against v6.4's release.

I happened to be talking to Palmer and he suggested making it depend on
NONPORTABLE:
|> config NONPORTABLE
|> bool "Allow configurations that result in non-portable kernels"
|> help
|> RISC-V kernel binaries are compatible between all known systems
|> whenever possible, but there are some use cases that can only be
|> satisfied by configurations that result in kernel binaries that are
|> not portable between systems.
|>
|> Selecting N does not guarantee kernels will be portable to all known
|> systems. Selecting any of the options guarded by NONPORTABLE will
|> result in kernel binaries that are unlikely to be portable between
|> systems.
|>
|> If unsure, say N.

I actually think that that makes more sense, as it may actually be fine
to use hibernation depending on what your SBI implementation does.

> If we choose to go this route, some thoughts about the commit message.
> > -- >8 --
> > From 1d4381290a1600eff9b29b8ace6be73955d9726c Mon Sep 17 00:00:00 2001
> > From: Conor Dooley <conor.dooley@xxxxxxxxxxxxx>
> > Date: Thu, 25 May 2023 15:09:08 +0100
> > Subject: [PATCH] RISC-V: mark hibernation as broken
> >
> > Hibernation support depends on firmware marking its reserved
> > regions as not mappable by Linux. As things stand, the de-facto SBI
>
> either not mappable or no save/restore capable (as We still have not
> concluded which way we want to go in)

s/mappable/accessible/? Sounds like a good catch all?

>
> > implementation (OpenSBI) does not do this, and other implementations may
> > not do so either, resulting in kernel panics during hibernation ([1],
> > [2]).
> >
>
> we should probably add more context in the commit message.
> How about adding something along these lines:
>
> As things stand, the latest version of de-facto SBI
> implementation(OpenSBI) doesn't
> do this any more to allow 1G huge page mappings by kernel. Other SBI
> implementations are probably
> doing the same. Until the commit 3335068 ("riscv: Use PUD/P4D/PGD
> pages for the linear mapping"),
> the first 2MB region of DRAM (where the typically firmware resides)
> was not mappable by kernel. However,
> enabling that mapping resulted in the kernel panics during hibernation
> ([1], [2]) as the hibernation process
> tries to save/restore any mapped region even though it is marked as reserved.

SGTM, I could go with that.

> > Disable support for hibernation until such time that an SBI
> > implementation independent way to communicate what regions are reserved
> > has been agreed upon.
> >
>
> Anybody who wants to test the hibernation feature must revert the
> above mentioned patch along with turning on
> the config.

This goes away with the use of non-portable, although I would work
mention of the config option into the commit message.

Thanks,
Conor.

> > Reported-by: Song Shuai <suagrfillet@xxxxxxxxx>
> > Link: https://lore.kernel.org/all/CAAYs2=gQvkhTeioMmqRDVGjdtNF_vhB+vm_1dHJxPNi75YDQ_Q@xxxxxxxxxxxxxx/ [1]
> > Reported-by: JeeHeng Sia <jeeheng.sia@xxxxxxxxxxxxxxxx>
> > Link: https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/ITXwaKfA6z8
> > Signed-off-by: Conor Dooley <conor.dooley@xxxxxxxxxxxxx>
> > ---
> > arch/riscv/Kconfig | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index 13f058490608..b2495192f35a 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -801,7 +801,7 @@ menu "Power management options"
> > source "kernel/power/Kconfig"
> >
> > config ARCH_HIBERNATION_POSSIBLE
> > - def_bool y
> > + def_bool n
> >
> > config ARCH_HIBERNATION_HEADER
> > def_bool HIBERNATION

Attachment: signature.asc
Description: PGP signature