Re: [PATCH v3 02/16] mm: introduce leaf entry type and use to simplify leaf entry logic
From: Zi Yan
Date: Tue Nov 11 2025 - 11:20:35 EST
On 11 Nov 2025, at 2:16, Lorenzo Stoakes wrote:
> On Mon, Nov 10, 2025 at 10:25:40PM -0500, Zi Yan wrote:
>> On 10 Nov 2025, at 17:21, Lorenzo Stoakes wrote:
>>
>>> The kernel maintains leaf page table entries which contain either:
>>>
>>> - Nothing ('none' entries)
>>> - Present entries (that is stuff the hardware can navigate without fault)
>>
>> This is not true for:
>>
>> 1. pXX_protnone(), where _PAGE_PROTNONE flag also means pXX_present() is
>> true, but hardware would still trigger a fault.
>
> Sigh. I'm very well aware of this, I've commented on this issue at length
> in discussions on-list and off.
>
> But for good or pad we decided to hack in protnone this way. As far as the
> kernel is concerned they _are_ present.
>
> Yes, technically, they're not, and will result in a fault, and will result in
> the whole NUMA balancing hint mechanism firing off.
>
> But I feel like it only adds noise and confusion to get into all that here,
> frankly.
>
>> 2. pmd_present() where _PAGE_PSE also means a present PMD (see the comment
>> in pmd_present()).
>
> Right, and here we go again with another 'wise decision'. That's just intensely
> gross, and one I wasn't aware of.
>
> But again, I'm not really interested in asterixing all of these.
>
> 'As far as the kernel is concerned' these are present. We have to lie in the bed
> we made AFAIC.
>
>>
>> This commit log needs to be updated.
>
> No it doesn't. As per the above, we have literally decided to treat these as if
> they were present in cases where, in fact, they're not.
>
> Note that to be thorough here I'd have to go through every single architecture
> and check every single caveat that exists in pXX_present() and pXX_none().
>
> Because I guarantee you there will be some oddities there.
>
> Is that a good use of my or anybody else's time?
>
> I think we have to draw the pedantry line somewhere.
>
>>
>>> - Everything else that will cause a fault which the kernel handles
>>
>> This is not true because of the reasons above.
>
> I covered this off in the above. I'm not really that interested in adding
> additional noise here, sorry.
>
> As a compromise - if I have to respin - I can add a very brief comment like
>
> * Note that there are exceptions such as protnone which for
> everything but the kernel fault handler ought to be treated as
> present but are in fact not. For avoidance of doubt, soft leaf
> entries treat pXX_none() and pXX_present() as the authoritative
> determinants of whether a page table entry is empty/present,
> regardless of hacked-in implementation details.
>
> Note how _already_ saying stuff like this adds confusion and 'wtf'. THis is
> what I'm trying to avoid.
>
> But if I have to respin, can add that.
>
>
>>
>> How should we categorize these non-present to HW but present to SW entries,
>> like protnone and under splitting PMDs? Strictly speaking, they are
>> softleaf entries, but that would require more changes to the kernel code
>> and pXX_present() means HW present.
>
> No they're not strictly speaking softleaf entries at all. These page table
> entries use every single bit except present/PSE. The softleaf abstraction
> does not retain all of these bits, and then it becomes impossible to
> determine which is 'present' in a software sense or not.
>
> We categorise pXX_present() leaf page table entries as... being present,
> even if past kernel developers decided to hack in cases which are present
> as far as the HW faulting mechanism is concerned, piling yet more confusion
> on everything.
>
> We made our bed on this and have to lie in it. There are numerous places
> where in page table code to all intents and purposes it looks like we're
> literally testing for hw-present entries whereas in fact we are not.
>
> So I don't think it is beneficial to do anything more on this other than
> perhaps updating _this_ commit message on respin.
>
>>
>> To not make this series more complicated, I think updating commit log
>> and comments to use pXX_present() instead of HW present might be
>> the easiest way out. We can revisit pXX_present() vs HW present later.
>
> No, there's nothing to revisit AFAIC.
>
> I'm not going to go through and update every single mention of faulting to
> account for that.
>
> I think it's an unreasonable level of pedantry.
Got it. As long as you are aware of this, I am fine with what you have now.
Best Regards,
Yan, Zi