On Wed, May 17, 2023 at 8:26 PM Alexandre Ghiti <alexghiti@xxxxxxxxxxxx> wrote:
On Wed, May 17, 2023 at 1:28 PM Conor Dooley <conor.dooley@xxxxxxxxxxxxx> wrote:Depending on a particular node name is fragile. If we really need
Hey Alex,Not sure this is a real regression, I'd rather avoid it, but as
On Wed, May 17, 2023 at 10:58:02AM +0200, Alexandre Ghiti wrote:
On Tue, May 16, 2023 at 1:12 PM Alexandre Ghiti <alexghiti@xxxxxxxxxxxx> wrote:AFAIU, losing the 1 GB hugepage is a regression, which would make this
On Tue, May 16, 2023 at 11:24 AM Song Shuai <suagrfillet@xxxxxxxxx> wrote:So here is the current linear mapping without nomap in openSBI:
I actually removed this flag a few years ago, and I have to admit that
I need to check if that's necessary: the goal of commit 3335068f8721
("riscv: Use PUD/P4D/PGD pages for the linear mapping") is to expose
the "right" start of DRAM so that we can align virtual and physical
addresses on a 1GB boundary.
So I have to check if a nomap region is actually added as a
memblock.memory.regions[] or not: if yes, that's perfect, let's add
the nomap attributes to the PMP regions, otherwise, I don't think that
is a good solution.
---[ Linear mapping ]---
0xff60000000000000-0xff60000000200000 0x0000000080000000 2M
PMD D A G . . W R V
0xff60000000200000-0xff60000000e00000 0x0000000080200000 12M
PMD D A G . . . R V
And below the linear mapping with nomap in openSBI:
---[ Linear mapping ]---
0xff60000000080000-0xff60000000200000 0x0000000080080000 1536K
PTE D A G . . W R V
0xff60000000200000-0xff60000000e00000 0x0000000080200000 12M
PMD D A G . . . R V
So adding nomap does not misalign virtual and physical addresses, it
prevents the usage of 1GB page for this area though, so that's a
solution, we just lose this 1GB page here.
But even though that may be the fix, I think we also need to fix that
in the kernel as it would break compatibility with certain versions of
openSBI *if* we fix openSBI...So here are a few solutions:
1. we can mark all "mmode_resv" nodes in the device tree as nomap,
before the linear mapping is established (IIUC, those nodes are added
by openSBI to advertise PMP regions)
-> This amounts to the same fix as opensbi and we lose the 1GB hugepage.
not an option, right?
mentioned in my first answer, Mike Rapoport showed that it was making
no difference performance-wise...
2. we can tweak pfn_is_nosave function to *not* save pfn correspondingDitto here re: introducing another regression.
to PMP regions
-> We don't lose the 1GB hugepage \o/
3. we can use register_nosave_region() to not save the "mmode_resv"
regions (x86 does that
https://elixir.bootlin.com/linux/v6.4-rc1/source/arch/x86/kernel/e820.c#L753)
-> We don't lose the 1GB hugepage \o/
4. Given JeeHeng pointer to
https://elixir.bootlin.com/linux/v6.4-rc1/source/kernel/power/snapshot.c#L1340,
we can mark those pages as non-readable and make the hibernation
process not save those pages
-> Very late-in-the-day idea, not sure what it's worth, we also
lose the 1GB hugepage...
To me, the best solution is 3 as it would prepare for other similar#3 would probably get my vote too. It seems like you could use it
issues later, it is similar to x86 and it allows us to keep 1GB
hugepages.
I have been thinking, and to me nomap does not provide anything since
the kernel should not address this memory range, so if it does, we
must fix the kernel.
Let me know what you all think, I'll be preparing a PoC of 3 in the meantime!
dynamically if there was to be a future other provider of "mmode_resv"
regions, rather than doing something location-specific.
We should probably document these opensbi reserved memory nodes though
in a dt-binding or w/e if we are going to be relying on them to not
crash!
information from DT then I suggest adding "no-save-restore" DT
property in reserved memory nodes.
Yes, you're right, let's see what Atish and Anup think!I think we have two possible approaches:
1) Update OpenSBI to set "no-map" DT property for firmware
reserved regions. We were doing this previously but removed
it later for performance reasons mentioned by Alex. It is also
worth mentioning that ARM Trusted Firmware also sets "no-map"
DT property for firmware reserved regions.
2) Add a new "no-save-restore" DT property in the reserved
memory DT bindings. The hibernate support of Linux arch/riscv
will use this DT property to exclude memory regions from
save-restore. The EFI implementation of EDK2 and U-Boot
should do the following:
1) Treat all memory having "no-map" DT property as EFI
reserved memory
2) Treat all memory not having "no-map" DT property and
not having "no-save-restore" DT property as EfiBootServicesData
3) Treat all memory not having "no-map" DT property and
having "no-save-restore" DT property as EfiRuntimeServiceData
(Refer,
https://devicetree-specification.readthedocs.io/en/latest/chapter3-devicenodes.html#reserved-memory-and-uefi)
Personally, I am leaning towards approach#1 since approach#2
will require changing DeviceTree specification as well.
Regards,
Anup
Thanks for your quick answers Conor and Song, really appreciated!_______________________________________________
Alex
Thanks for working on this,
Conor.
linux-riscv mailing list
linux-riscv@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/linux-riscv