Re: [PATCH] KVM: x86/mmu: Don't create SPTEs for addresses that aren't mappable
From: Yan Zhao
Date: Thu Mar 05 2026 - 02:59:42 EST
On Wed, Feb 18, 2026 at 04:22:41PM -0800, Sean Christopherson wrote:
> Track the mask of guest physical address bits that can actually be mapped
> by a given MMU instance that utilizes TDP, and either exit to userspace
> with -EFAULT or go straight to emulation without creating an SPTE (for
> emulated MMIO) if KVM can't map the address. Attempting to create an SPTE
> can cause KVM to drop the unmappable bits, and thus install a bad SPTE.
> E.g. when starting a walk, the TDP MMU will round the GFN based on the
> root level, and drop the upper bits.
>
> Exit with -EFAULT in the unlikely scenario userspace is misbehaving and
> created a memslot that can't be addressed, e.g. if userspace installed
> memory above the guest.MAXPHYADDR defined in CPUID, as there's nothing KVM
> can do to make forward progress, and there _is_ a memslot for the address.
> For emulated MMIO, KVM can at least kick the bad address out to userspace
> via a normal MMIO exit.
>
> The flaw has existed for a very long time, and was exposed by commit
> 988da7820206 ("KVM: x86/tdp_mmu: WARN if PFN changes for spurious faults")
> thanks to a syzkaller program that prefaults memory at GPA 0x1000000000000
> and then faults in memory at GPA 0x0 (the extra-large GPA gets wrapped to
> '0').
If the scenario is: when ad bit is disabled, prefault memory at GPA 0x0, then
guest reads memory at GPA 0x1000000000000, would fast_page_fault() fix a wrong
wrapped sptep for GPA 0x1000000000000?
Do we need to check fault->addr in fast_page_fault() as well?
> WARNING: arch/x86/kvm/mmu/tdp_mmu.c:1183 at kvm_tdp_mmu_map+0x5c3/0xa30 [kvm], CPU#125: syz.5.22/18468
> CPU: 125 UID: 0 PID: 18468 Comm: syz.5.22 Tainted: G S W 6.19.0-smp--23879af241d6-next #57 NONE
> Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN
> Hardware name: Google Izumi-EMR/izumi, BIOS 0.20250917.0-0 09/17/2025
> RIP: 0010:kvm_tdp_mmu_map+0x5c3/0xa30 [kvm]
> Call Trace:
> <TASK>
> kvm_tdp_page_fault+0x107/0x140 [kvm]
> kvm_mmu_do_page_fault+0x121/0x200 [kvm]
> kvm_arch_vcpu_pre_fault_memory+0x18c/0x230 [kvm]
> kvm_vcpu_pre_fault_memory+0x116/0x1e0 [kvm]
> kvm_vcpu_ioctl+0x3a5/0x6b0 [kvm]
> __se_sys_ioctl+0x6d/0xb0
> do_syscall_64+0x8d/0x900
> entry_SYSCALL_64_after_hwframe+0x4b/0x53
> </TASK>
>
> In practice, the flaw is benign (other than the new WARN) as it only
> affects guests that ignore guest.MAXPHYADDR (e.g. on CPUs with 52-bit
> physical addresses but only 4-level paging) or guests being run by a
> misbehaving userspace VMM (e.g. a VMM that ignored allow_smaller_maxphyaddr
> or is pre-faulting bad addresses).
>
> For non-TDP shadow paging, always clear the unmappable mask as the flaw
> only affects GPAs affected. For 32-bit paging, 64-bit virtual addresses
> simply don't exist. Even when software can shove a 64-bit address
> somewhere, e.g. into SYSENTER_EIP, the value is architecturally truncated
> before it reaches the page table walker. And for 64-bit paging, KVM's use
> of 4-level vs. 5-level paging is tied to the guest's CR4.LA57, i.e. KVM
> won't observe a 57-bit virtual address with a 4-level MMU.