Re: Bug report: kernel paniced when system hibernates

From: Conor Dooley
Date: Wed May 17 2023 - 07:28:10 EST


Hey Alex,

On Wed, May 17, 2023 at 10:58:02AM +0200, Alexandre Ghiti wrote:
> On Tue, May 16, 2023 at 1:12 PM Alexandre Ghiti <alexghiti@xxxxxxxxxxxx> wrote:

> > On Tue, May 16, 2023 at 11:24 AM Song Shuai <suagrfillet@xxxxxxxxx> wrote:
> > I actually removed this flag a few years ago, and I have to admit that
> > I need to check if that's necessary: the goal of commit 3335068f8721
> > ("riscv: Use PUD/P4D/PGD pages for the linear mapping") is to expose
> > the "right" start of DRAM so that we can align virtual and physical
> > addresses on a 1GB boundary.
> >
> > So I have to check if a nomap region is actually added as a
> > memblock.memory.regions[] or not: if yes, that's perfect, let's add
> > the nomap attributes to the PMP regions, otherwise, I don't think that
> > is a good solution.
>
> So here is the current linear mapping without nomap in openSBI:
>
> ---[ Linear mapping ]---
> 0xff60000000000000-0xff60000000200000 0x0000000080000000 2M
> PMD D A G . . W R V
> 0xff60000000200000-0xff60000000e00000 0x0000000080200000 12M
> PMD D A G . . . R V
>
> And below the linear mapping with nomap in openSBI:
>
> ---[ Linear mapping ]---
> 0xff60000000080000-0xff60000000200000 0x0000000080080000 1536K
> PTE D A G . . W R V
> 0xff60000000200000-0xff60000000e00000 0x0000000080200000 12M
> PMD D A G . . . R V
>
> So adding nomap does not misalign virtual and physical addresses, it
> prevents the usage of 1GB page for this area though, so that's a
> solution, we just lose this 1GB page here.
>
> But even though that may be the fix, I think we also need to fix that
> in the kernel as it would break compatibility with certain versions of
> openSBI *if* we fix openSBI...So here are a few solutions:
>
> 1. we can mark all "mmode_resv" nodes in the device tree as nomap,
> before the linear mapping is established (IIUC, those nodes are added
> by openSBI to advertise PMP regions)
> -> This amounts to the same fix as opensbi and we lose the 1GB hugepage.

AFAIU, losing the 1 GB hugepage is a regression, which would make this
not an option, right?

> 2. we can tweak pfn_is_nosave function to *not* save pfn corresponding
> to PMP regions
> -> We don't lose the 1GB hugepage \o/
> 3. we can use register_nosave_region() to not save the "mmode_resv"
> regions (x86 does that
> https://elixir.bootlin.com/linux/v6.4-rc1/source/arch/x86/kernel/e820.c#L753)
> -> We don't lose the 1GB hugepage \o/
> 4. Given JeeHeng pointer to
> https://elixir.bootlin.com/linux/v6.4-rc1/source/kernel/power/snapshot.c#L1340,
> we can mark those pages as non-readable and make the hibernation
> process not save those pages
> -> Very late-in-the-day idea, not sure what it's worth, we also
> lose the 1GB hugepage...

Ditto here re: introducing another regression.

> To me, the best solution is 3 as it would prepare for other similar
> issues later, it is similar to x86 and it allows us to keep 1GB
> hugepages.
>
> I have been thinking, and to me nomap does not provide anything since
> the kernel should not address this memory range, so if it does, we
> must fix the kernel.
>
> Let me know what you all think, I'll be preparing a PoC of 3 in the meantime!

#3 would probably get my vote too. It seems like you could use it
dynamically if there was to be a future other provider of "mmode_resv"
regions, rather than doing something location-specific.

We should probably document these opensbi reserved memory nodes though
in a dt-binding or w/e if we are going to be relying on them to not
crash!

Thanks for working on this,
Conor.

Attachment: signature.asc
Description: PGP signature