Re: [PATCH 0/2] KVM: arm64: Fix host/hyp tracking on share/unshare hypercall failure
From: Vincent Donnefort
Date: Fri May 29 2026 - 05:22:21 EST
On Fri, May 29, 2026 at 09:20:50AM +0100, Fuad Tabba wrote:
> On Fri, 29 May 2026 at 09:15, Marc Zyngier <maz@xxxxxxxxxx> wrote:
> >
> > On Fri, 29 May 2026 09:05:35 +0100,
> > Fuad Tabba <tabba@xxxxxxxxxx> wrote:
> > >
> > > On Fri, 29 May 2026 at 09:02, Vincent Donnefort <vdonnefort@xxxxxxxxxx> wrote:
> > > >
> > > > On Fri, May 29, 2026 at 08:43:39AM +0100, tabba@xxxxxxxxxx wrote:
> > > > > Hi folks,
> > > > >
> > > > > Yet another bug I found while testing Sashiko locally with fixes to
> > > > > review-prompts.
> > > > >
> > > > > share_pfn_hyp() and unshare_pfn_hyp() in arch/arm64/kvm/mmu.c
> > > > > maintain a host-side RB-tree mirroring the set of pages shared with
> > > > > EL2. Both invoke a hypercall that can fail (page-state mismatch,
> > > > > EL2 refcount still held), but neither cleans up on failure:
> > > > >
> > > > > - share_pfn_hyp() inserts the tracking node before the hypercall
> > > > > and leaves it in the tree on failure, leaking the allocation and
> > > > > presenting a phantom share to a later unshare.
> > > > >
> > > > > - unshare_pfn_hyp() erases the tracking node before the hypercall;
> > > > > on failure the host loses its record while EL2 still owns the
> > > > > share, breaking later operations on the same pfn.
> > > > >
> > > > > Severity is low (no isolation impact) and the failure paths are rare
> > > > > in practice, but the desync is real. Both patches are independent and
> > > > > apply cleanly to current mainline. In other words, this can wait for
> > > > > 7.2.
> > > >
> > > >
> > > > I believe I fixed that here lore.kernel.org/all/acyKhZL2di_QQ9xm@xxxxxxxxxx but
> > > > as Quentin pointed-out, there's absolutely no reason for the hypercall to fail.
> > > > So I haven't sent a v2.
> > >
> > > At the very least we need to add a comment, otherwise, people like me
> > > and LLMs like Sashiko would stumble upon it.
> > >
> > > That said, this fix adds no real overhead, makes the code clearer, and
> > > guards us against a future where that call might fail.
> > > Self-documenting in essense.
> > >
> > > WDYT?
> >
> > If a hypercall really cannot fail, why does it have a return value?
>
> Good point. If we know it cannot fail, how about just `void`?
>
> That said, Vincen't exact words are: `very much unlikely`, not the
> same as cannot fail :)
>
> https://lore.kernel.org/all/acyKhZL2di_QQ9xm@xxxxxxxxxx/
The error would happen only if the host tries to share/unshare a page with the
wrong state. This would only happen in the case of a misbehaving host.
And Quentin's point was that this is anyway incomplete. To handle this error
properly, kvm_share_hyp/kvm_unshare_hyp would also need to rollback things...
The callers of the unshare should also leak the memory which couldn't be
unshared properly. This isn't the case now, (however we do WARN_ON).
>
> /fuad
>
> >
> > M.
> >
> > --
> > Without deviation from the norm, progress is not possible.