Re: [PATCH 0/2] KVM: arm64: Fix host/hyp tracking on share/unshare hypercall failure
From: Marc Zyngier
Date: Fri May 29 2026 - 05:29:50 EST
On Fri, 29 May 2026 09:20:50 +0100,
Fuad Tabba <tabba@xxxxxxxxxx> wrote:
>
> On Fri, 29 May 2026 at 09:15, Marc Zyngier <maz@xxxxxxxxxx> wrote:
> >
> > On Fri, 29 May 2026 09:05:35 +0100,
> > Fuad Tabba <tabba@xxxxxxxxxx> wrote:
> > >
> > > On Fri, 29 May 2026 at 09:02, Vincent Donnefort <vdonnefort@xxxxxxxxxx> wrote:
> > > >
> > > > On Fri, May 29, 2026 at 08:43:39AM +0100, tabba@xxxxxxxxxx wrote:
> > > > > Hi folks,
> > > > >
> > > > > Yet another bug I found while testing Sashiko locally with fixes to
> > > > > review-prompts.
> > > > >
> > > > > share_pfn_hyp() and unshare_pfn_hyp() in arch/arm64/kvm/mmu.c
> > > > > maintain a host-side RB-tree mirroring the set of pages shared with
> > > > > EL2. Both invoke a hypercall that can fail (page-state mismatch,
> > > > > EL2 refcount still held), but neither cleans up on failure:
> > > > >
> > > > > - share_pfn_hyp() inserts the tracking node before the hypercall
> > > > > and leaves it in the tree on failure, leaking the allocation and
> > > > > presenting a phantom share to a later unshare.
> > > > >
> > > > > - unshare_pfn_hyp() erases the tracking node before the hypercall;
> > > > > on failure the host loses its record while EL2 still owns the
> > > > > share, breaking later operations on the same pfn.
> > > > >
> > > > > Severity is low (no isolation impact) and the failure paths are rare
> > > > > in practice, but the desync is real. Both patches are independent and
> > > > > apply cleanly to current mainline. In other words, this can wait for
> > > > > 7.2.
> > > >
> > > >
> > > > I believe I fixed that here lore.kernel.org/all/acyKhZL2di_QQ9xm@xxxxxxxxxx but
> > > > as Quentin pointed-out, there's absolutely no reason for the hypercall to fail.
> > > > So I haven't sent a v2.
> > >
> > > At the very least we need to add a comment, otherwise, people like me
> > > and LLMs like Sashiko would stumble upon it.
> > >
> > > That said, this fix adds no real overhead, makes the code clearer, and
> > > guards us against a future where that call might fail.
> > > Self-documenting in essense.
> > >
> > > WDYT?
> >
> > If a hypercall really cannot fail, why does it have a return value?
>
> Good point. If we know it cannot fail, how about just `void`?
>
> That said, Vincen't exact words are: `very much unlikely`, not the
> same as cannot fail :)
>
> https://lore.kernel.org/all/acyKhZL2di_QQ9xm@xxxxxxxxxx/
I think the rules are simple:
- if something can fail, we need to handle the failure
- if something should not fail and has the potential of compromising
the system, we should panic
- if something absolutely cannot fail, then there is nothing to handle
Thanks,
M.
--
Without deviation from the norm, progress is not possible.