Re: [PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID

From: Ryan Roberts
Date: Thu Apr 25 2024 - 04:40:48 EST


On 24/04/2024 17:43, Catalin Marinas wrote:
> On Wed, Apr 24, 2024 at 12:10:16PM +0100, Ryan Roberts wrote:
>> Previously PTE_PROT_NONE was occupying bit 58, one of the bits reserved
>> for SW use when the PTE is valid. This is a waste of those precious SW
>> bits since PTE_PROT_NONE can only ever be set when valid is clear.
>> Instead let's overlay it on what would be a HW bit if valid was set.
>>
>> We need to be careful about which HW bit to choose since some of them
>> must be preserved; when pte_present() is true (as it is for a
>> PTE_PROT_NONE pte), it is legitimate for the core to call various
>> accessors, e.g. pte_dirty(), pte_write() etc. There are also some
>> accessors that are private to the arch which must continue to be
>> honoured, e.g. pte_user(), pte_user_exec() etc.
>>
>> So we choose to overlay PTE_UXN; This effectively means that whenever a
>> pte has PTE_PROT_NONE set, it will always report pte_user_exec() ==
>> false, which is obviously always correct.
>>
>> As a result of this change, we must shuffle the layout of the
>> arch-specific swap pte so that PTE_PROT_NONE is always zero and not
>> overlapping with any other field. As a result of this, there is no way
>> to keep the `type` field contiguous without conflicting with
>> PMD_PRESENT_INVALID (bit 59), which must also be 0 for a swap pte. So
>> let's move PMD_PRESENT_INVALID to bit 60.
>
> I think we discussed but forgot the details. What was the reason for not
> using, say, bit 60 for PTE_PROT_NONE to avoid all the swap bits
> reshuffling? Clearing or setting of the PTE_PROT_NONE bit is done via
> pte_modify() and this gets all the new permission bits anyway. With POE
> support (on the list for now), PTE_PROT_NONE would overlap with
> POIndex[0] but I don't think we ever plan to read this field (other than
> maybe ptdump). The POIndex field is set from the vma->vm_page_prot (Joey
> may need to adjust vm_get_page_prot() in his patches to avoid setting a
> pkey on a PROT_NONE mapping).
>

Copy/pasting your comment from the other patch into this one since its easier to
discuss it all together:

Ah, I did not realise we need to free up bit 3 from the swap pte as
well. Though maybe patch 1 is fine as is but for the record, it would be
good to justify the decision to go with PTE_UXN.

While we need a new bit in the swap pte for uffd-wp, its just a SW bit - it
could go anywhere. I chose bit 3 because it was free after all the other shuffling.

As for choosing PTE_UXN for PTE_PROT_NONE, I wanted to choose a bit that would
definitely never lead to confusion if ever interpretted as its HW meaning, since
as far as the core-mm is concerned, the pte is either present or its not, and if
it is present, then it is completely valid to call all the pte_*() helpers. By
definition, if PTE_PROT_NONE is set, then the PTE is not executable in user
space, so any helpers that continue to interpret the bit position as UXN will
still give sensible answers.

Yes, I could have just put PTE_PROT_NONE in bit position 60 and avoided all the
shuffling. But in the past you have pushed back on using the PBHA bits due to
out of tree patches using them. I thought it was better to just sidestep having
to think about it by not using them. Additionally, as you point out, I didn't
want to risk overlapping with the POIndex and that causing subtle bugs.

But then... PMD_PRESENT_INVALID. Which already turns out to be violating my
above considerations. Ugh. I considered moving that to NS, but it would have
required splitting the offset field into 2 discontiguous parts in the swap pte.
In the end, I decided its ok in any position because its transient; its just a
temp marker and the pte will soon get set again from scratch so it doesn't
matter is adding the marker is destructive.

Personally I think there is less risk of a future/subtle bug by putting
PTE_PROT_NONE over PTE_UXN. But if you prefer to reduce churn by putting it at
bit 60 then I'm ok with that.