Re: [RFC PATCH 1/4] x86/mm/cpa: restore global bit when page is present
From: Aaron Lu
Date: Thu Aug 11 2022 - 08:28:53 EST
On 8/11/2022 7:30 PM, Hyeonggon Yoo wrote:
> On Thu, Aug 11, 2022 at 08:16:08AM +0000, Lu, Aaron wrote:
>> On Thu, 2022-08-11 at 05:21 +0000, Hyeonggon Yoo wrote:
>>> On Mon, Aug 08, 2022 at 10:56:46PM +0800, Aaron Lu wrote:
>>>> For configs that don't have PTI enabled or cpus that don't need
>>>> meltdown mitigation, current kernel can lose GLOBAL bit after a page
>>>> goes through a cycle of present -> not present -> present.
>>>>
>>>> It happened like this(__vunmap() does this in vm_remove_mappings()):
>>>> original page protection: 0x8000000000000163 (NX/G/D/A/RW/P)
>>>> set_memory_np(page, 1): 0x8000000000000062 (NX/D/A/RW) lose G and P
>>>> set_memory_p(pagem 1): 0x8000000000000063 (NX/D/A/RW/P) restored P
>>>>
>>>> In the end, this page's protection no longer has Global bit set and this
>>>> would create problem for this merge small mapping feature.
>>>>
>>>> For this reason, restore Global bit for systems that do not have PTI
>>>> enabled if page is present.
>>>>
>>>> (pgprot_clear_protnone_bits() deserves a better name if this patch is
>>>> acceptible but first, I would like to get some feedback if this is the
>>>> right way to solve this so I didn't bother with the name yet)
>>>>
>>>> Signed-off-by: Aaron Lu <aaron.lu@xxxxxxxxx>
>>>> ---
>>>> arch/x86/mm/pat/set_memory.c | 2 ++
>>>> 1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
>>>> index 1abd5438f126..33657a54670a 100644
>>>> --- a/arch/x86/mm/pat/set_memory.c
>>>> +++ b/arch/x86/mm/pat/set_memory.c
>>>> @@ -758,6 +758,8 @@ static pgprot_t pgprot_clear_protnone_bits(pgprot_t prot)
>>>> */
>>>> if (!(pgprot_val(prot) & _PAGE_PRESENT))
>>>> pgprot_val(prot) &= ~_PAGE_GLOBAL;
>>>> + else
>>>> + pgprot_val(prot) |= _PAGE_GLOBAL & __default_kernel_pte_mask;
>>>>
>>>> return prot;
>>>> }
>>>
>>> IIUC It makes it unable to set _PAGE_GLOBL when PTI is on.
>>>
>>
>> Yes. Is this a problem?
>> I think that is the intended behaviour when PTI is on: not to enable
>> Gloabl bit on kernel mappings.
>
> Please note that I'm not expert on PTI.
>
> but AFAIK with PTI, at least everything (kernel part) mapped to user page table is
> mapped as global when PGE is supported.
>
> Not sure "Global bit is never used for kernel part when PTI is enabled"
> is true.
>
> Also, commit d1440b23c922d ("x86/mm: Factor out pageattr _PAGE_GLOBAL setting") that introduced
> pgprot_clear_protnone_bits() says:
>
> This unconditional setting of _PAGE_GLOBAL is a problem when we have
> PTI and non-PTI and we want some areas to have _PAGE_GLOBAL and some
> not.
>
> This updated version of the code says:
> 1. Clear _PAGE_GLOBAL when !_PAGE_PRESENT
> 2. Never set _PAGE_GLOBAL implicitly
> 3. Allow _PAGE_GLOBAL to be in cpa.set_mask
> 4. Allow _PAGE_GLOBAL to be inherited from previous PTE
>
Thanks for these info, I'll need to take a closer look at PTI.
>>> Maybe it would be less intrusive to make
>>> set_direct_map_default_noflush() replace protection bits
>>> with PAGE_KENREL as it's only called for direct map, and the function
>>> is to reset permission to default:
>>>
>>> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
>>> index 1abd5438f126..0dd4433c1382 100644
>>> --- a/arch/x86/mm/pat/set_memory.c
>>> +++ b/arch/x86/mm/pat/set_memory.c
>>> @@ -2250,7 +2250,16 @@ int set_direct_map_invalid_noflush(struct page *page)
>>>
>>> int set_direct_map_default_noflush(struct page *page)
>>> {
>>> - return __set_pages_p(page, 1);
>>> + unsigned long tempaddr = (unsigned long) page_address(page);
>>> + struct cpa_data cpa = {
>>> + .vaddr = &tempaddr,
>>> + .pgd = NULL,
>>> + .numpages = 1,
>>> + .mask_set = PAGE_KERNEL,
>>> + .mask_clr = __pgprot(~0),
>
> Nah, this sets _PAGE_ENC unconditionally, which should be evaluated.
> Maybe less intrusive way would be:
> .mask_set = __pgprot(_PAGE_PRESENT |
> (_PAGE_GLOBAL & __kernel_default_pte_mask)),
> .mask_clr = __pgprot(0),
>
>>> + .flags = 0};
>>> +
>>> + return __change_page_attr_set_clr(&cpa, 0);
>>> }
>>
>> Looks reasonable to me and it is indeed less intrusive. I'm only
>> concerned there might be other paths that also go through present ->
>> not present -> present and this change can not cover them.
>>
>
> AFAIK other paths going through present->not present->present (using CPA)
> is only when DEBUG_PAGEALLOC is used.
>
> Do we care direct map fragmentation when using DEBUG_PAGEALLOC?
>
No, direct mapping does not use large page mapping when DEBUG_PAGEALLOC.
>>>
>>> set_direct_map_{invalid,default}_noflush() is the exact reason
>>> why direct map become split after vmalloc/vfree with special
>>> permissions.
>>
>> Yes I agree, because it can lose G bit after the whole cycle when PTI
>> is not on. When PTI is on, there is no such problem because G bit is
>> not there initially.
>>
>> Thanks,
>> Aaron
>