Re: [PATCH] KVM: arm64: account pKVM reclaim against the VM mm

From: Will Deacon

Date: Tue Jun 23 2026 - 10:33:30 EST

On Tue, Jun 23, 2026 at 02:50:48PM +0100, Marc Zyngier wrote:
> On Tue, 23 Jun 2026 14:41:20 +0100,
> Will Deacon <will@xxxxxxxxxx> wrote:
> >
> > On Mon, Jun 22, 2026 at 09:32:29AM +0100, Marc Zyngier wrote:
> > > On Sun, 21 Jun 2026 22:31:55 +0100,
> > > Bradley Morgan <include@xxxxxxxxx> wrote:
> > > >
> > > > Protected guest faults charge long term pins to the VM's mm. Teardown
> > > > can run later from file release, where current->mm may be unrelated.
> > > >
> > > > Drop the charge from kvm->mm instead.
> > > >
> > > > Fixes: 4e6e03f9eadd ("KVM: arm64: Hook up reclaim hypercall to pkvm_pgtable_stage2_destroy()")
> > > > Signed-off-by: Bradley Morgan <include@xxxxxxxxx>
> > > > ---
> > > > arch/arm64/kvm/pkvm.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
> > > > index 053e4f733e4b..428723b1b0f5 100644
> > > > --- a/arch/arm64/kvm/pkvm.c
> > > > +++ b/arch/arm64/kvm/pkvm.c
> > > > @@ -352,7 +352,7 @@ static int __pkvm_pgtable_stage2_reclaim(struct kvm_pgtable *pgt, u64 start, u64
> > > > page = pfn_to_page(mapping->pfn);
> > > > WARN_ON_ONCE(mapping->nr_pages != 1);
> > > > unpin_user_pages_dirty_lock(&page, 1, true);
> > > > - account_locked_vm(current->mm, 1, false);
> > > > + account_locked_vm(kvm->mm, 1, false);
> > > > pkvm_mapping_remove(mapping, &pgt->pkvm_mappings);
> > > > kfree(mapping);
> > > > }
> > >
> > > Seems correct to me, as the final mmdrop(kvm->mm) occurs after S2
> > > teardown.
> > >
> > > Will, what do you think?
> >
> > Thanks, this looks correct to me.
> >
> > While I was thinking about it, I also started looking at the use of
> > 'current->mm' in kvm_arch_prepare_memory_region() in case that should
> > also be 'kvm->mm'. However, I then realised that I don't really grok
> > that code at all because it does a bunch of checking on the VMAs with
> > mmap_read_lock(current->mm) held, but then that lock is dropped
> > immediately after doing the checks so I'm not really sure what they
> > are protected against. Presumably, the address space could be modified
> > as soon as the lock is dropped?
> >
> > But it's hot, so I'm probably missing something here.
>
> I think this is just trying to catch a few obvious issues, such as
> dirty logging on device memory, but that only works for well behaved
> userspace that is making "a honest mistake".
>
> For the more trying ones, we end-up doing the same checks again at
> fault time anyway.

Got it, so it's a best-effort check. Although it does mean that memslot
changes have to be done by the same mm as kvm->mm, otherwise you could
get a spurious error back from the kernel.

Will