Re: [PATCH V2 7/7] mm: Use pgdp_get() for accessing PGD entries
From: Ryan Roberts
Date: Mon Sep 23 2024 - 11:21:45 EST
>> Let's just rewind a bit. This thread exists because the kernel test robot failed
>> to compile pgd_none_or_clear_bad() (a core-mm function) for the arm architecture
>> after Anshuman changed the direct pgd dereference to pgdp_get(). The reason
>> compilation failed is because arm defines its own pgdp_get() override, but it is
>> broken (there is a typo).
>
> Let's not rewind, because had you fully read and digested my reply, you
> would have seen why this isn't a problem... but let me spell it out.
>
>>
>> Code before Anshuman's change:
>>
>> static inline int pgd_none_or_clear_bad(pgd_t *pgd)
>> {
>> if (pgd_none(*pgd))
>> return 1;
>> if (unlikely(pgd_bad(*pgd))) {
>> pgd_clear_bad(pgd);
>> return 1;
>> }
>> return 0;
>> }
>
> This isn't a problem as the code stands. While there is a dereference
> in C, that dereference is a simple struct copy, something that we use
> everywhere in the kernel. However, that is as far as it goes, because
> neither pgd_none() and pgd_bad() make use of their argument, and thus
> the compiler will optimise it away, resulting in no actual access to
> the page tables - _as_ _intended_.
Right. Are you saying you depend upon those loads being optimized away for
correctness or performance reasons?
>
> If these are going to be converted to pgd_get(), then we need pgd_get()
> to _also_ be optimised away,
OK, agreed.
So perhaps the best approach is to modify the existing default pxdp_get()
implementations to just do a C dereference. That will ensure that there are no
intended consequences, unlike moving to READ_ONCE() by default. Then riscv
(which I think is the only arch to actually use pxdp_get() currently?) will need
its own pxdp_get() overrides, which use READ_ONCE(). arm64 would also define its
own overrides in terms of READ_ONCE() to ensure single copy atomicity in the
presence of HW updates.
How does that sound to you?
> and if e.g. this is the only place that
> pgd_get() is going to be used, the suggestion I made in my previous
> email is entirely reasonable, since we know that the result of pgd_get()
> will not actually be used.
I guess you could do that as an arm-specific override, but I don't think it adds
anything over using my proposed reworked default? Your call.
>
>> As an aside, the kernel also dereferences p4d, pud, pmd and pte pointers in
>> various circumstances.
>
> I already covered these in my previous reply.
>
>> And other changes in this series are also replacing those
>> direct dereferences with calls to similar helpers. The fact that these are all
>> folded (by a custom arm implementation if I've understood the below correctly)
>> just means that each dereference is returning what you would call the pmd from
>> the HW perspective, I think?
>
> It'll "return" the first of each pair of level-1 page table entries,
> which is pgd[0] or *p4d, *pud, *pmd - but all of these except *pmd
> need to be optimised away, so throwing lots of READ_ONCE() around
> this code without considering this is certainly the wrong approach.
Yep, got it.
>
>>>> The core-mm today
>>>> dereferences pgd pointers (and p4d, pud, pmd pointers) directly in its code. See
>>>> follow_pfnmap_start(),
>>>
>>> Doesn't seem to exist at least not in 6.11.
>>
>> Appologies, I'm on mm-unstable and that isn't upstream yet. See follow_pte() in
>> v6.11 or __apply_to_page_range(), or pgd_none_or_clear_bad() as per above.
>
> Looking at follow_pte(), it's not a problem.
>
> I think we wouldn't be having this conversation before:
>
> commit a32618d28dbe6e9bf8ec508ccbc3561a7d7d32f0
> Author: Russell King <rmk+kernel@xxxxxxxxxxxxxxxx>
> Date: Tue Nov 22 17:30:28 2011 +0000
>
> ARM: pgtable: switch to use pgtable-nopud.h
>
> where:
> -#define pgd_none(pgd) (0)
> -#define pgd_bad(pgd) (0)
>
> existed before this commit - and thus the dereference in things like:
>
> pgd_none(*pgd)
>
> wouldn't even be visible to beyond the preprocessor step.
>