Re: [PATCH v3 02/16] mm: introduce leaf entry type and use to simplify leaf entry logic

From: Lorenzo Stoakes

Date: Tue Nov 11 2025 - 02:17:46 EST


On Mon, Nov 10, 2025 at 10:25:40PM -0500, Zi Yan wrote:
> On 10 Nov 2025, at 17:21, Lorenzo Stoakes wrote:
>
> > The kernel maintains leaf page table entries which contain either:
> >
> > - Nothing ('none' entries)
> > - Present entries (that is stuff the hardware can navigate without fault)
>
> This is not true for:
>
> 1. pXX_protnone(), where _PAGE_PROTNONE flag also means pXX_present() is
> true, but hardware would still trigger a fault.

Sigh. I'm very well aware of this, I've commented on this issue at length
in discussions on-list and off.

But for good or pad we decided to hack in protnone this way. As far as the
kernel is concerned they _are_ present.

Yes, technically, they're not, and will result in a fault, and will result in
the whole NUMA balancing hint mechanism firing off.

But I feel like it only adds noise and confusion to get into all that here,
frankly.

> 2. pmd_present() where _PAGE_PSE also means a present PMD (see the comment
> in pmd_present()).

Right, and here we go again with another 'wise decision'. That's just intensely
gross, and one I wasn't aware of.

But again, I'm not really interested in asterixing all of these.

'As far as the kernel is concerned' these are present. We have to lie in the bed
we made AFAIC.

>
> This commit log needs to be updated.

No it doesn't. As per the above, we have literally decided to treat these as if
they were present in cases where, in fact, they're not.

Note that to be thorough here I'd have to go through every single architecture
and check every single caveat that exists in pXX_present() and pXX_none().

Because I guarantee you there will be some oddities there.

Is that a good use of my or anybody else's time?

I think we have to draw the pedantry line somewhere.

>
> > - Everything else that will cause a fault which the kernel handles
>
> This is not true because of the reasons above.

I covered this off in the above. I'm not really that interested in adding
additional noise here, sorry.

As a compromise - if I have to respin - I can add a very brief comment like

* Note that there are exceptions such as protnone which for
everything but the kernel fault handler ought to be treated as
present but are in fact not. For avoidance of doubt, soft leaf
entries treat pXX_none() and pXX_present() as the authoritative
determinants of whether a page table entry is empty/present,
regardless of hacked-in implementation details.

Note how _already_ saying stuff like this adds confusion and 'wtf'. THis is
what I'm trying to avoid.

But if I have to respin, can add that.


>
> How should we categorize these non-present to HW but present to SW entries,
> like protnone and under splitting PMDs? Strictly speaking, they are
> softleaf entries, but that would require more changes to the kernel code
> and pXX_present() means HW present.

No they're not strictly speaking softleaf entries at all. These page table
entries use every single bit except present/PSE. The softleaf abstraction
does not retain all of these bits, and then it becomes impossible to
determine which is 'present' in a software sense or not.

We categorise pXX_present() leaf page table entries as... being present,
even if past kernel developers decided to hack in cases which are present
as far as the HW faulting mechanism is concerned, piling yet more confusion
on everything.

We made our bed on this and have to lie in it. There are numerous places
where in page table code to all intents and purposes it looks like we're
literally testing for hw-present entries whereas in fact we are not.

So I don't think it is beneficial to do anything more on this other than
perhaps updating _this_ commit message on respin.

>
> To not make this series more complicated, I think updating commit log
> and comments to use pXX_present() instead of HW present might be
> the easiest way out. We can revisit pXX_present() vs HW present later.

No, there's nothing to revisit AFAIC.

I'm not going to go through and update every single mention of faulting to
account for that.

I think it's an unreasonable level of pedantry.

>
> OK, I will focus on code review now.

Thanks.