Re: [RFC PATCH 1/8] mm: Provide pagesize to pmd_populate()
From: Jason Gunthorpe
Date: Thu Apr 04 2024 - 07:47:10 EST
On Wed, Apr 03, 2024 at 06:24:38PM +0000, Christophe Leroy wrote:
> > If it is a software walker there might be value in just aligning to
> > the contig pte scheme in all levels and forgetting about the variable
> > size page table levels. That quarter page stuff is a PITA to manage
> > the memory allocation for on PPC anyhow..
>
> Looking one step further, into nohash/32, I see a challenge: on that
> platform, a PTE is 64 bits while a PGD/PMD entry is 32 bits. It is
> therefore not possible as such to do PMD leaf or cont-PMD leaf.
Hmm, maybe not, I have a feeling you can hide this detail in the
pmd_offset routine if you pass in the PGD information too.
> - Double the size of PGD/PMD entries, but then we loose atomicity when
> reading or writing an entry, could this be a problem ?
How does the 64 bit PTE work then? We have ignored this bug on x86 32
bit, but there is a general difficult race with 64 bit atomicity on 32
bit CPUs in the page tables.
Ideally you'd have 64 bit entries at the PMD level that encode the
page size the same as the PTE level. So you hit any level and you know
your size. This is less memory efficient (though every other arch
tolerates this) in general cases.
Can you throw away some bits of PA in the 32 bit entries to signal a
size?
> - Do as for the 8xx, ie go down to PTEs even for pages greater than 4M.
Aside from the memory waste, this is the most logical thing, go down
far enough that you can encode the desired page size in the PTE and
use the contig PTE scheme.
Jason