Re: [PATCHv9 11/14] x86/mm: Avoid load_unaligned_zeropad() stepping into unaccepted memory

From: Vlastimil Babka
Date: Mon Apr 03 2023 - 09:28:50 EST


On 3/30/23 13:49, Kirill A. Shutemov wrote:
> load_unaligned_zeropad() can lead to unwanted loads across page boundaries.
> The unwanted loads are typically harmless. But, they might be made to
> totally unrelated or even unmapped memory. load_unaligned_zeropad()
> relies on exception fixup (#PF, #GP and now #VE) to recover from these
> unwanted loads.
>
> But, this approach does not work for unaccepted memory. For TDX, a load
> from unaccepted memory will not lead to a recoverable exception within
> the guest. The guest will exit to the VMM where the only recourse is to
> terminate the guest.
>
> There are three parts to fix this issue and comprehensively avoid access
> to unaccepted memory. Together these ensure that an extra "guard" page
> is accepted in addition to the memory that needs to be used.
>
> 1. Implicitly extend the range_contains_unaccepted_memory(start, end)
> checks up to end+2M if 'end' is aligned on a 2M boundary. It may
> require checking 2M chunk beyond end of RAM. The bitmap allocation is
> modified to accommodate this.
> 2. Implicitly extend accept_memory(start, end) to end+2M if 'end' is
> aligned on a 2M boundary.
> 3. Set PageUnaccepted() on both memory that itself needs to be accepted
> *and* memory where the next page needs to be accepted. Essentially,
> make PageUnaccepted(page) a marker for whether work needs to be done
> to make 'page' usable. That work might include accepting pages in
> addition to 'page' itself.
>
> Side note: This leads to something strange. Pages which were accepted
> at boot, marked by the firmware as accepted and will never
> _need_ to be accepted might have PageUnaccepted() set on
> them. PageUnaccepted(page) is a cue to ensure that the next
> page is accepted before 'page' can be used.

At least the part about PageUnaccepted() is obsolete in v9, no?

> This is an actual, real-world problem which was discovered during TDX
> testing.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
> Reviewed-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> ---
> arch/x86/mm/unaccepted_memory.c | 39 +++++++++++++++++++++++++
> drivers/firmware/efi/libstub/x86-stub.c | 7 +++++
> 2 files changed, 46 insertions(+)
>
> diff --git a/arch/x86/mm/unaccepted_memory.c b/arch/x86/mm/unaccepted_memory.c
> index 1df918b21469..a0a58486eb74 100644
> --- a/arch/x86/mm/unaccepted_memory.c
> +++ b/arch/x86/mm/unaccepted_memory.c
> @@ -23,6 +23,38 @@ void accept_memory(phys_addr_t start, phys_addr_t end)
> bitmap = __va(boot_params.unaccepted_memory);
> range_start = start / PMD_SIZE;
>
> + /*
> + * load_unaligned_zeropad() can lead to unwanted loads across page
> + * boundaries. The unwanted loads are typically harmless. But, they
> + * might be made to totally unrelated or even unmapped memory.
> + * load_unaligned_zeropad() relies on exception fixup (#PF, #GP and now
> + * #VE) to recover from these unwanted loads.
> + *
> + * But, this approach does not work for unaccepted memory. For TDX, a
> + * load from unaccepted memory will not lead to a recoverable exception
> + * within the guest. The guest will exit to the VMM where the only
> + * recourse is to terminate the guest.
> + *
> + * There are three parts to fix this issue and comprehensively avoid
> + * access to unaccepted memory. Together these ensure that an extra
> + * "guard" page is accepted in addition to the memory that needs to be
> + * used:
> + *
> + * 1. Implicitly extend the range_contains_unaccepted_memory(start, end)
> + * checks up to end+2M if 'end' is aligned on a 2M boundary.
> + *
> + * 2. Implicitly extend accept_memory(start, end) to end+2M if 'end' is
> + * aligned on a 2M boundary. (immediately following this comment)
> + *
> + * 3. Set PageUnaccepted() on both memory that itself needs to be
> + * accepted *and* memory where the next page needs to be accepted.
> + * Essentially, make PageUnaccepted(page) a marker for whether work
> + * needs to be done to make 'page' usable. That work might include
> + * accepting pages in addition to 'page' itself.
> + */

And here.

> + if (!(end % PMD_SIZE))
> + end += PMD_SIZE;
> +
> spin_lock_irqsave(&unaccepted_memory_lock, flags);
> for_each_set_bitrange_from(range_start, range_end, bitmap,
> DIV_ROUND_UP(end, PMD_SIZE)) {
> @@ -46,6 +78,13 @@ bool range_contains_unaccepted_memory(phys_addr_t start, phys_addr_t end)
>
> bitmap = __va(boot_params.unaccepted_memory);
>
> + /*
> + * Also consider the unaccepted state of the *next* page. See fix #1 in
> + * the comment on load_unaligned_zeropad() in accept_memory().
> + */
> + if (!(end % PMD_SIZE))
> + end += PMD_SIZE;
> +
> spin_lock_irqsave(&unaccepted_memory_lock, flags);
> while (start < end) {
> if (test_bit(start / PMD_SIZE, bitmap)) {
> diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
> index 1643ddbde249..1afe7b5b02e1 100644
> --- a/drivers/firmware/efi/libstub/x86-stub.c
> +++ b/drivers/firmware/efi/libstub/x86-stub.c
> @@ -715,6 +715,13 @@ static efi_status_t allocate_unaccepted_bitmap(struct boot_params *params,
> return EFI_SUCCESS;
> }
>
> + /*
> + * range_contains_unaccepted_memory() may need to check one 2M chunk
> + * beyond the end of RAM to deal with load_unaligned_zeropad(). Make
> + * sure that the bitmap is large enough handle it.
> + */
> + max_addr += PMD_SIZE;
> +
> /*
> * If unaccepted memory is present, allocate a bitmap to track what
> * memory has to be accepted before access.