Re: Bug report: kernel paniced when system hibernates

From: Atish Patra
Date: Thu May 25 2023 - 15:20:31 EST


On Thu, May 25, 2023 at 11:22 AM Conor Dooley <conor@xxxxxxxxxx> wrote:
>
> Hey Atish,
>
> On Thu, May 25, 2023 at 10:39:44AM -0700, Atish Patra wrote:
> > > How about the below?
>
> > Instead of disabling hibernate support why not revert the patch
> > 3335068 ("riscv: Use PUD/P4D/PGD pages for the linear mapping")
> > which doesn't add any "measured" value at this point.
> > However, keeping the hibernation feature on and disabling linear
> > mapping will get more testing on hibernation.
> > While disabling hibernation and keeping the above patch which
> > doesn't have any value at all.
> >
> > We don't have a regression at this point. So either approach will work though.
>
> I favoured this approach so that we do not release a kernel in which
> hibernate works for these versions of OpenSBI and then stops working in
> the future when we shore up how communicating this is supposed to work.
> It allows us to fix the problem "properly" in slow-time, instead of
> racing against v6.4's release.
>

Fair enough.

> I happened to be talking to Palmer and he suggested making it depend on
> NONPORTABLE:
> |> config NONPORTABLE
> |> bool "Allow configurations that result in non-portable kernels"
> |> help
> |> RISC-V kernel binaries are compatible between all known systems
> |> whenever possible, but there are some use cases that can only be
> |> satisfied by configurations that result in kernel binaries that are
> |> not portable between systems.
> |>
> |> Selecting N does not guarantee kernels will be portable to all known
> |> systems. Selecting any of the options guarded by NONPORTABLE will
> |> result in kernel binaries that are unlikely to be portable between
> |> systems.
> |>
> |> If unsure, say N.
>
> I actually think that that makes more sense, as it may actually be fine
> to use hibernation depending on what your SBI implementation does.
>

That works too.

> > If we choose to go this route, some thoughts about the commit message.
> > > -- >8 --
> > > From 1d4381290a1600eff9b29b8ace6be73955d9726c Mon Sep 17 00:00:00 2001
> > > From: Conor Dooley <conor.dooley@xxxxxxxxxxxxx>
> > > Date: Thu, 25 May 2023 15:09:08 +0100
> > > Subject: [PATCH] RISC-V: mark hibernation as broken
> > >
> > > Hibernation support depends on firmware marking its reserved
> > > regions as not mappable by Linux. As things stand, the de-facto SBI
> >
> > either not mappable or no save/restore capable (as We still have not
> > concluded which way we want to go in)
>
> s/mappable/accessible/? Sounds like a good catch all?
>

Yeah.

> >
> > > implementation (OpenSBI) does not do this, and other implementations may
> > > not do so either, resulting in kernel panics during hibernation ([1],
> > > [2]).
> > >
> >
> > we should probably add more context in the commit message.
> > How about adding something along these lines:
> >
> > As things stand, the latest version of de-facto SBI
> > implementation(OpenSBI) doesn't
> > do this any more to allow 1G huge page mappings by kernel. Other SBI
> > implementations are probably
> > doing the same. Until the commit 3335068 ("riscv: Use PUD/P4D/PGD
> > pages for the linear mapping"),
> > the first 2MB region of DRAM (where the typically firmware resides)
> > was not mappable by kernel. However,
> > enabling that mapping resulted in the kernel panics during hibernation
> > ([1], [2]) as the hibernation process
> > tries to save/restore any mapped region even though it is marked as reserved.
>
> SGTM, I could go with that.
>
> > > Disable support for hibernation until such time that an SBI
> > > implementation independent way to communicate what regions are reserved
> > > has been agreed upon.
> > >
> >
> > Anybody who wants to test the hibernation feature must revert the
> > above mentioned patch along with turning on
> > the config.
>
> This goes away with the use of non-portable, although I would work
> mention of the config option into the commit message.
>

Any testing of hibernation still needs to revert the patch until we
have the proper fix.

> Thanks,
> Conor.
>
> > > Reported-by: Song Shuai <suagrfillet@xxxxxxxxx>
> > > Link: https://lore.kernel.org/all/CAAYs2=gQvkhTeioMmqRDVGjdtNF_vhB+vm_1dHJxPNi75YDQ_Q@xxxxxxxxxxxxxx/ [1]
> > > Reported-by: JeeHeng Sia <jeeheng.sia@xxxxxxxxxxxxxxxx>
> > > Link: https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/ITXwaKfA6z8
> > > Signed-off-by: Conor Dooley <conor.dooley@xxxxxxxxxxxxx>
> > > ---
> > > arch/riscv/Kconfig | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > > index 13f058490608..b2495192f35a 100644
> > > --- a/arch/riscv/Kconfig
> > > +++ b/arch/riscv/Kconfig
> > > @@ -801,7 +801,7 @@ menu "Power management options"
> > > source "kernel/power/Kconfig"
> > >
> > > config ARCH_HIBERNATION_POSSIBLE
> > > - def_bool y
> > > + def_bool n
> > >
> > > config ARCH_HIBERNATION_HEADER
> > > def_bool HIBERNATION
>


--
Regards,
Atish