Re: [PATCH mm-unstable v2 05/10] kvm/arm64: add kvm_arch_test_clear_young()

From: Yu Zhao
Date: Wed May 31 2023 - 17:13:22 EST


On Wed, May 31, 2023 at 1:56 PM Oliver Upton <oliver.upton@xxxxxxxxx> wrote:
>
> Hi Yu,
>
> On Fri, May 26, 2023 at 05:44:30PM -0600, Yu Zhao wrote:
> > Implement kvm_arch_test_clear_young() to support the fast path in
> > mmu_notifier_ops->test_clear_young().
> >
> > It focuses on a simple case, i.e., hardware sets the accessed bit in
> > KVM PTEs and VMs are not protected, where it can rely on RCU and
> > cmpxchg to safely clear the accessed bit without taking
> > kvm->mmu_lock. Complex cases fall back to the existing slow path
> > where kvm->mmu_lock is then taken.
> >
> > Signed-off-by: Yu Zhao <yuzhao@xxxxxxxxxx>
> > ---
> > arch/arm64/include/asm/kvm_host.h | 6 ++++++
> > arch/arm64/kvm/mmu.c | 36 +++++++++++++++++++++++++++++++
> > 2 files changed, 42 insertions(+)
> >
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 7e7e19ef6993..da32b0890716 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -1113,4 +1113,10 @@ static inline void kvm_hyp_reserve(void) { }
> > void kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu);
> > bool kvm_arm_vcpu_stopped(struct kvm_vcpu *vcpu);
> >
> > +#define kvm_arch_has_test_clear_young kvm_arch_has_test_clear_young
> > +static inline bool kvm_arch_has_test_clear_young(void)
> > +{
> > + return cpu_has_hw_af() && !is_protected_kvm_enabled();
> > +}
>
> I would *strongly* suggest you consider supporting test_clear_young on
> systems that do software Access Flag management. FEAT_HAFDBS is an
> *optional* extension to the architecture, so we're going to support
> software AF management for a very long time in KVM. It is also a valid
> fallback option in the case of hardware errata which render HAFDBS
> broken.

Hi Oliver,

It's not about willingness but resources. Ideally we want to make
everything perfect, but in reality, we can only move forward one step
a time.

If I looked at your request from ARM's POV, I would agree with you.
But my goal is to lay the foundation for all architectures that could
benefit, so I may not be able to cover a lot for each architecture.
Specifically, I don't have the bandwidth to test the !FEAT_HAFDBS case
for ARM.

So here are some options I could offer, ordered by my preferences:
1. We proceed as it is for now. I *will* find someone from my team (or
yours) to follow up -- this way we can make sure !FEAT_HAFDBS is well
tested.
2. I drop the cpu_has_hw_af() check above. Not that I think there is
much risk, I'm just trying to be cautious.
3. I drop the entire ARM support (and include the RISC-V support which
I previously deprioritized). We revisit after the test is done.

Sounds reasonable?

> So, we should expect (and support) systems of all shapes and sizes that
> do software AF. I'm sure we'll hear about more in the not-too-distant
> future...
>
> For future reference (even though I'm suggesting you support software
> AF), decisions such of these need an extremely verbose comment
> describing the rationale behind the decision.
>
> > +
> > #endif /* __ARM64_KVM_HOST_H__ */
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index c3b3e2afe26f..26a8d955b49c 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
>
> Please do not implement page table walkers outside of hyp/pgtable.c
>
> > @@ -1678,6 +1678,42 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
> > range->start << PAGE_SHIFT);
> > }
> >
> > +static int stage2_test_clear_young(const struct kvm_pgtable_visit_ctx *ctx,
> > + enum kvm_pgtable_walk_flags flags)
> > +{
> > + kvm_pte_t new = ctx->old & ~KVM_PTE_LEAF_ATTR_LO_S2_AF;
> > +
> > + VM_WARN_ON_ONCE(!page_count(virt_to_page(ctx->ptep)));
>
> This sort of sanity checking is a bit excessive. Isn't there a risk of
> false negatives here too? IOW, if we tragically mess up RCU in the page
> table code, what's stopping a prematurely freed page from being
> allocated to another user?

Yes, but from my aforementioned POV (the breadth I'm focusing on),
this is a good practice. I can live without this assertion if you feel
strongly about it.

> > + if (!kvm_pte_valid(new))
> > + return 0;
> > +
> > + if (new == ctx->old)
> > + return 0;
> > +
> > + if (kvm_should_clear_young(ctx->arg, ctx->addr / PAGE_SIZE))
> > + stage2_try_set_pte(ctx, new);
> > +
> > + return 0;
> > +}
> > +
> > +bool kvm_arch_test_clear_young(struct kvm *kvm, struct kvm_gfn_range *range)
> > +{
> > + u64 start = range->start * PAGE_SIZE;
> > + u64 end = range->end * PAGE_SIZE;
> > + struct kvm_pgtable_walker walker = {
> > + .cb = stage2_test_clear_young,
> > + .arg = range,
> > + .flags = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_SHARED,
> > + };
> > +
> > + BUILD_BUG_ON(is_hyp_code());
>
> Delete this assertion.

Will do.