Re: [Patch v3] x86/head/64: remove redundant check on level2_kernel_pgt's _PAGE_PRESENT bit

From: Wei Yang
Date: Mon Jun 03 2024 - 17:13:03 EST


On Mon, Jun 03, 2024 at 11:50:06AM -0700, Dave Hansen wrote:
>On 5/23/24 05:35, Wei Yang wrote:
>> --- a/arch/x86/kernel/head64.c
>> +++ b/arch/x86/kernel/head64.c
>> @@ -260,8 +260,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
>>
>> /* fixup pages that are part of the kernel image */
>> for (; i <= pmd_index((unsigned long)_end); i++)
>> - if (pmd[i] & _PAGE_PRESENT)
>> - pmd[i] += load_delta;
>> + pmd[i] += load_delta;
>
>So, I think this is correct. But, man, I wish folks would go through
>the git history and make it clear that they understand _how_ thecode
>got the way it is.
>

Dave

Thanks for your comment.

In my first version, it lists the historical change, while Thomas thought they
are not relevant. So I remove those descriptions.

https://lkml.org/lkml/2024/3/23/350

>I suspect that the original _PAGE_PRESENT check wasn't even necessary if
>cleanup_highmap() really did fix things up. But this commit:
>
> 2aa85f246c18 ("x86/boot/64: Make level2_kernel_pgt pages invalid
> outside kernel area")
>
>tweaked things to actively clear out PMDs that weren't populated in
>Kirill's original loop. It didn't touch the _PAGE_PRESENT check. But
>it certainly did imply that the PMD doesn't have any holes in it and
>there's nothing int he middle that needs _PAGE_PRESENT cleared.
>

As I mentioned in my first version, the original code is introduced by

commit 1ab60e0f72f7 ("[PATCH] x86-64: Relocatable Kernel Support")

The reason for the check on _PAGE_PRESENT is at that moment, level2_kernel_pgt
is defined as:

NEXT_PAGE(level2_kernel_pgt)
/* 40MB kernel mapping. The kernel code cannot be bigger than that.
When you change this change KERNEL_TEXT_SIZE in page.h too. */
/* (2^48-(2*1024*1024*1024)-((2^39)*511)-((2^30)*510)) = 0 */
PMDS(0x0000000000000000, __PAGE_KERNEL_LARGE_EXEC|_PAGE_GLOBAL,
KERNEL_TEXT_SIZE/PMD_SIZE)
/* Module mapping starts here */
.fill (PTRS_PER_PMD - (KERNEL_TEXT_SIZE/PMD_SIZE)),8,0

While now, it looks like this:

SYM_DATA_START_PAGE_ALIGNED(level2_kernel_pgt)
/*
* Kernel high mapping.
*
* The kernel code+data+bss must be located below KERNEL_IMAGE_SIZE in
* virtual address space, which is 1 GiB if RANDOMIZE_BASE is enabled,
* 512 MiB otherwise.
*
* (NOTE: after that starts the module area, see MODULES_VADDR.)
*
* This table is eventually used by the kernel during normal runtime.
* Care must be taken to clear out undesired bits later, like _PAGE_RW
* or _PAGE_GLOBAL in some cases.
*/
PMDS(0, __PAGE_KERNEL_LARGE_EXEC, KERNEL_IMAGE_SIZE/PMD_SIZE)
SYM_DATA_END(level2_kernel_pgt)

The difference is at the original version, level2_kernel_pgt is not all
defined with _PAGE_PRESENT set. I didn't dig into from which commit we expand
the level2_kernel_pgt to full, while I think from that point, the check is
redundant.

>> level2_kernel_pgt compiled with _PAGE_PRESENT set. The check is
>> redundant
>
>This isn't super reassuring. It also depends on nothing having munged
>the page tables up to this point. The code is also a bit cruel in that
>it manipulates two different sets of PMDs with the same 'pmd' variable.
>
>Also, is this comment still accurate after '2aa85f246c18'?
>
>> * Fixup the kernel text+data virtual addresses. Note that
>> * we might write invalid pmds, when the kernel is relocated
>> * cleanup_highmap() fixes this up along with the mappings
>> * beyond _end.

Sounds this is not necessary any more. Do you prefer to remove this in next
version of this patch.

--
Wei Yang
Help you, Help me