Re: [PATCH mm-unstable v2 05/10] kvm/arm64: add kvm_arch_test_clear_young()

From: Oliver Upton
Date: Wed May 31 2023 - 15:56:20 EST


Hi Yu,

On Fri, May 26, 2023 at 05:44:30PM -0600, Yu Zhao wrote:
> Implement kvm_arch_test_clear_young() to support the fast path in
> mmu_notifier_ops->test_clear_young().
>
> It focuses on a simple case, i.e., hardware sets the accessed bit in
> KVM PTEs and VMs are not protected, where it can rely on RCU and
> cmpxchg to safely clear the accessed bit without taking
> kvm->mmu_lock. Complex cases fall back to the existing slow path
> where kvm->mmu_lock is then taken.
>
> Signed-off-by: Yu Zhao <yuzhao@xxxxxxxxxx>
> ---
> arch/arm64/include/asm/kvm_host.h | 6 ++++++
> arch/arm64/kvm/mmu.c | 36 +++++++++++++++++++++++++++++++
> 2 files changed, 42 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 7e7e19ef6993..da32b0890716 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -1113,4 +1113,10 @@ static inline void kvm_hyp_reserve(void) { }
> void kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu);
> bool kvm_arm_vcpu_stopped(struct kvm_vcpu *vcpu);
>
> +#define kvm_arch_has_test_clear_young kvm_arch_has_test_clear_young
> +static inline bool kvm_arch_has_test_clear_young(void)
> +{
> + return cpu_has_hw_af() && !is_protected_kvm_enabled();
> +}

I would *strongly* suggest you consider supporting test_clear_young on
systems that do software Access Flag management. FEAT_HAFDBS is an
*optional* extension to the architecture, so we're going to support
software AF management for a very long time in KVM. It is also a valid
fallback option in the case of hardware errata which render HAFDBS
broken.

So, we should expect (and support) systems of all shapes and sizes that
do software AF. I'm sure we'll hear about more in the not-too-distant
future...

For future reference (even though I'm suggesting you support software
AF), decisions such of these need an extremely verbose comment
describing the rationale behind the decision.

> +
> #endif /* __ARM64_KVM_HOST_H__ */
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index c3b3e2afe26f..26a8d955b49c 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c

Please do not implement page table walkers outside of hyp/pgtable.c

> @@ -1678,6 +1678,42 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
> range->start << PAGE_SHIFT);
> }
>
> +static int stage2_test_clear_young(const struct kvm_pgtable_visit_ctx *ctx,
> + enum kvm_pgtable_walk_flags flags)
> +{
> + kvm_pte_t new = ctx->old & ~KVM_PTE_LEAF_ATTR_LO_S2_AF;
> +
> + VM_WARN_ON_ONCE(!page_count(virt_to_page(ctx->ptep)));

This sort of sanity checking is a bit excessive. Isn't there a risk of
false negatives here too? IOW, if we tragically mess up RCU in the page
table code, what's stopping a prematurely freed page from being
allocated to another user?

> + if (!kvm_pte_valid(new))
> + return 0;
> +
> + if (new == ctx->old)
> + return 0;
> +
> + if (kvm_should_clear_young(ctx->arg, ctx->addr / PAGE_SIZE))
> + stage2_try_set_pte(ctx, new);
> +
> + return 0;
> +}
> +
> +bool kvm_arch_test_clear_young(struct kvm *kvm, struct kvm_gfn_range *range)
> +{
> + u64 start = range->start * PAGE_SIZE;
> + u64 end = range->end * PAGE_SIZE;
> + struct kvm_pgtable_walker walker = {
> + .cb = stage2_test_clear_young,
> + .arg = range,
> + .flags = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_SHARED,
> + };
> +
> + BUILD_BUG_ON(is_hyp_code());

Delete this assertion.

> + kvm_pgtable_walk(kvm->arch.mmu.pgt, start, end - start, &walker);
> +
> + return false;
> +}
> +
> phys_addr_t kvm_mmu_get_httbr(void)
> {
> return __pa(hyp_pgtable->pgd);
> --
> 2.41.0.rc0.172.g3f132b7071-goog
>

--
Thanks,
Oliver