Re: [PATCH] KVM: clear map->gfn in kvm_vcpu_unmap() to prevent stale validity checks

From: Taeyang Lee

Date: Thu Mar 12 2026 - 03:00:58 EST


Hi Sean,

Thanks for the detailed review.

On Wed, Mar 11, 2026 at 10:53 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Wed, Mar 11, 2026, Taeyang Lee wrote:
> > kvm_vcpu_unmap() clears map->hva and map->page but leaves map->gfn with
> > its previous value. This creates an inconsistent state: callers that
> > check gfn != 0 as a proxy for map validity will believe the map is still
>
> But that was and still is flat out wrong, '0' is a perfectly legal gfn.

You're right, I hadn't considered that. Setting gfn = 0 as a sentinel
is fundamentally incorrect.

>
> > valid when hva is already NULL.
> >
> > This pattern caused a null pointer dereference in the 6.1.x LTS branch,
> > where vmx_guest_apic_has_interrupt() checked virtual_apic_map.gfn but
> > dereferenced virtual_apic_map.hva unconditionally. That specific call
> > site no longer exists in mainline due to the gfn_to_pfn_cache
> > refactoring, but the inconsistency in kvm_vcpu_unmap() remains and could
> > affect future kvm_host_map users that rely on gfn for validity.
> >
> > Similarly, kvm_vcpu_map() does not modify the map struct on failure, so
> > stale gfn values from a previous successful mapping survive a failed
> > remap attempt. Clearing gfn in kvm_vcpu_unmap() ensures that after an
> > unmap-then-failed-remap sequence, gfn correctly reflects that no valid
> > mapping exists.
> >
> > Clear map->gfn in kvm_vcpu_unmap().
> >
> > Reported-by: Taeyang Lee <0wn@xxxxxxxxx>
> > Fixes: e45adf665a53 ("KVM: Introduce a new guest mapping API")
>
> This is not the correct fixes, the blame lies squarely on 96c66e87deee ("KVM/nVMX:
> Use kvm_vcpu_map when mapping the virtual APIC page"). The API even provided
> kvm_vcpu_mapped() from the get-go; why commit e45adf665a53 checked the gfn, I have
> no idea.
>

Agreed — the bug is in the caller misusing gfn instead of
kvm_vcpu_mapped(), not in the unmap API itself.

> > Signed-off-by: Taeyang Lee <0wn@xxxxxxxxx>
> > ---
> > virt/kvm/kvm_main.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 7a4fd1dbe0d7..88fc8b20aa8f 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -2887,6 +2887,7 @@ void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty)
> >
> > map->hva = NULL;
> > map->page = NULL;
> > + map->gfn = 0;
>
> As above, this is at best misleading. If we wanted to harden this, we should do
>
> map->gfn = INVALID_GPA;
>
> but I honestly don't see the point. Yes, we had terrible code in the past, but
> the above doesn't prevent terrible code. E.g. the correct invalidation wouldn't
> actually do anything to make the terrible code less broken.
>
> We even (somewhat unintentionally) removed that terrible code in commit 321ef62b0c5f
> ("KVM: nVMX: Fold requested virtual interrupt check into has_nested_events()").
> And that commit was tagged for stable@, it just didn't apply. If we want to do
> anything for 6.1.y (and 6.6.y?), we should backport that entire series:
>
> https://lore.kernel.org/all/2024072925-straw-mashing-54f6@gregkh

I see — so the proper LTS fix would be to backport that series rather
than adding band-aid patches. That makes sense.

For context, I sent the patch because the NULL dereference appears
to be guest-triggerable on 6.1.y, and I wanted to suggest something
minimal for the LTS branches. But based on your feedback, backporting
the series that includes 321ef62b0c5f sounds like the correct fix.

I also sent a minimal NULL check patch for vmx_guest_apic_has_interrupt()
to stable@xxxxxxxxxxxxxxx. Would that still be reasonable as an interim
fix until the full series is backported, or would you prefer to just
backport the series directly?

Thanks