Re: [RFC PATCH v2 23/23] KVM: TDX: Turn on PG_LEVEL_2M after TD is RUNNABLE
From: Yan Zhao
Date: Fri Nov 14 2025 - 03:36:54 EST
On Tue, Nov 11, 2025 at 07:25:30PM +0800, Huang, Kai wrote:
> On Thu, 2025-08-07 at 17:46 +0800, Yan Zhao wrote:
> > + /* Large page is not supported before TD runnable,*/
> > + if (KVM_BUG_ON(kvm_tdx->state != TD_STATE_RUNNABLE && level != PG_LEVEL_4K, kvm))
> > return -EINVAL;
>
> Not a particular comment to this patch, but could you elaborate a little bit
> why PROMOTE isn't supported in this series? This doesn't seem to be
> mentioned anywhere in this series (not in the coverletter either).
I mentioned it briefly in the coverletter:
6. Page merging (page promotion)
Promotion is disallowed (in patch 7), because
- The current TDX module requires all 4KB leafs to be either all PENDING
or all ACCEPTED before a successful promotion to 2MB. This requirement
prevents successful page merging after partially converting a 2MB
range from private to shared and then back to private, which is the
primary scenario necessitating page promotion.
- tdh_mem_page_promote() depends on tdh_mem_range_block() in the current
TDX module. Consequently, handling BUSY errors is complex, as page
merging typically occurs in the fault path under a shared mmu_lock.
v1 explains it in more details (See section "Page merging (page promotion)" in
[*]).
[*] https://lore.kernel.org/all/20250424030033.32635-1-yan.y.zhao@xxxxxxxxx/
> E.g., theoretically, I think we can have a way to PROMOTE mappings for
> initial memory pages (via TDH.MEM.PAGE.ADD), e.g., right before the TD is
> becoming runnable?
Right. Kirill also asked it in in v1 [1].
Though we have no need to worry about the nr_premapped calculation after Sean's
cleanup series, I think there's no need to complicate the design for the initial
support, due to the limited the amount of initial memory pages.
In my environment, for a TD with 8GB memory, there are 1086 count of 2MB mapping
at runtime, but the initial memory is merely 1049 4KB pages in total.
So, the gain is less than 2/1000.
Will call it out in the next version.
[1] https://lore.kernel.org/all/aAn3SSocw0XvaRye@xxxxxxxxxxxxxxxxxxxxxxxxx/