Re: [PATCH 0/2] KVM: arm64: Fix host/hyp tracking on share/unshare hypercall failure
From: Fuad Tabba
Date: Fri May 29 2026 - 05:27:06 EST
On Fri, 29 May 2026 at 10:21, Vincent Donnefort <vdonnefort@xxxxxxxxxx> wrote:
>
> On Fri, May 29, 2026 at 09:20:50AM +0100, Fuad Tabba wrote:
> > On Fri, 29 May 2026 at 09:15, Marc Zyngier <maz@xxxxxxxxxx> wrote:
> > >
> > > On Fri, 29 May 2026 09:05:35 +0100,
> > > Fuad Tabba <tabba@xxxxxxxxxx> wrote:
> > > >
> > > > On Fri, 29 May 2026 at 09:02, Vincent Donnefort <vdonnefort@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Fri, May 29, 2026 at 08:43:39AM +0100, tabba@xxxxxxxxxx wrote:
> > > > > > Hi folks,
> > > > > >
> > > > > > Yet another bug I found while testing Sashiko locally with fixes to
> > > > > > review-prompts.
> > > > > >
> > > > > > share_pfn_hyp() and unshare_pfn_hyp() in arch/arm64/kvm/mmu.c
> > > > > > maintain a host-side RB-tree mirroring the set of pages shared with
> > > > > > EL2. Both invoke a hypercall that can fail (page-state mismatch,
> > > > > > EL2 refcount still held), but neither cleans up on failure:
> > > > > >
> > > > > > - share_pfn_hyp() inserts the tracking node before the hypercall
> > > > > > and leaves it in the tree on failure, leaking the allocation and
> > > > > > presenting a phantom share to a later unshare.
> > > > > >
> > > > > > - unshare_pfn_hyp() erases the tracking node before the hypercall;
> > > > > > on failure the host loses its record while EL2 still owns the
> > > > > > share, breaking later operations on the same pfn.
> > > > > >
> > > > > > Severity is low (no isolation impact) and the failure paths are rare
> > > > > > in practice, but the desync is real. Both patches are independent and
> > > > > > apply cleanly to current mainline. In other words, this can wait for
> > > > > > 7.2.
> > > > >
> > > > >
> > > > > I believe I fixed that here lore.kernel.org/all/acyKhZL2di_QQ9xm@xxxxxxxxxx but
> > > > > as Quentin pointed-out, there's absolutely no reason for the hypercall to fail.
> > > > > So I haven't sent a v2.
> > > >
> > > > At the very least we need to add a comment, otherwise, people like me
> > > > and LLMs like Sashiko would stumble upon it.
> > > >
> > > > That said, this fix adds no real overhead, makes the code clearer, and
> > > > guards us against a future where that call might fail.
> > > > Self-documenting in essense.
> > > >
> > > > WDYT?
> > >
> > > If a hypercall really cannot fail, why does it have a return value?
> >
> > Good point. If we know it cannot fail, how about just `void`?
> >
> > That said, Vincen't exact words are: `very much unlikely`, not the
> > same as cannot fail :)
> >
> > https://lore.kernel.org/all/acyKhZL2di_QQ9xm@xxxxxxxxxx/
>
> The error would happen only if the host tries to share/unshare a page with the
> wrong state. This would only happen in the case of a misbehaving host.
>
> And Quentin's point was that this is anyway incomplete. To handle this error
> properly, kvm_share_hyp/kvm_unshare_hyp would also need to rollback things...
> The callers of the unshare should also leak the memory which couldn't be
> unshared properly. This isn't the case now, (however we do WARN_ON).
If we WARN_ON() in hyp, then I argue we shouldn't have a return value.
Or at least add a comment, BUG_ON() here. Think of the poor LLMs and
the people who run them :)
/fuad
>
> >
> > /fuad
> >
> > >
> > > M.
> > >
> > > --
> > > Without deviation from the norm, progress is not possible.