Re: [PATCH mm-unstable v2 06/10] kvm/powerpc: make radix page tables RCU safe
From: Yu Zhao
Date: Tue Jun 20 2023 - 04:01:39 EST
On Tue, Jun 20, 2023 at 12:33 AM Nicholas Piggin <npiggin@xxxxxxxxx> wrote:
>
> On Sat May 27, 2023 at 9:44 AM AEST, Yu Zhao wrote:
> > KVM page tables are currently not RCU safe against remapping, i.e.,
> > kvmppc_unmap_free_pmd_entry_table() et al. The previous
>
> Minor nit but the "page table" is not RCU-safe against something. It
> is RCU-freed, and therefore some algorithm that accesses it can have
> the existence guarantee provided by RCU (usually there still needs
> to be more to it).
>
> > mmu_notifier_ops members rely on kvm->mmu_lock to synchronize with
> > that operation.
> >
> > However, the new mmu_notifier_ops member test_clear_young() provides
> > a fast path that does not take kvm->mmu_lock. To implement
> > kvm_arch_test_clear_young() for that path, orphan page tables need to
> > be freed by RCU.
>
> Short version: clear the referenced bit using RCU instead of MMU lock
> to protect against page table freeing, and there is no problem with
> clearing the bit in a table that has been freed.
>
> Seems reasonable.
Thanks. All above points taken.
> > Unmapping, specifically kvm_unmap_radix(), does not free page tables,
> > hence not a concern.
>
> Not sure if you really need to make the distinction about why the page
> table is freed, we might free them via unmapping. The point is just
> anything that frees them while there can be concurrent access, right?
Correct.
> > Signed-off-by: Yu Zhao <yuzhao@xxxxxxxxxx>
> > ---
> > arch/powerpc/kvm/book3s_64_mmu_radix.c | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> > index 461307b89c3a..3b65b3b11041 100644
> > --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
> > +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> > @@ -1469,13 +1469,15 @@ int kvmppc_radix_init(void)
> > {
> > unsigned long size = sizeof(void *) << RADIX_PTE_INDEX_SIZE;
> >
> > - kvm_pte_cache = kmem_cache_create("kvm-pte", size, size, 0, pte_ctor);
> > + kvm_pte_cache = kmem_cache_create("kvm-pte", size, size,
> > + SLAB_TYPESAFE_BY_RCU, pte_ctor);
> > if (!kvm_pte_cache)
> > return -ENOMEM;
> >
> > size = sizeof(void *) << RADIX_PMD_INDEX_SIZE;
> >
> > - kvm_pmd_cache = kmem_cache_create("kvm-pmd", size, size, 0, pmd_ctor);
> > + kvm_pmd_cache = kmem_cache_create("kvm-pmd", size, size,
> > + SLAB_TYPESAFE_BY_RCU, pmd_ctor);
> > if (!kvm_pmd_cache) {
> > kmem_cache_destroy(kvm_pte_cache);
> > return -ENOMEM;
>
> KVM PPC HV radix PUD level page tables use the arch/powerpc allocators
> (for some reason), which are not RCU freed. I think you need them too?
We don't. The use of the arch/powerpc allocator for PUD tables seems
appropriate to me because, unlike PMD/PTE tables, we never free PUD
tables during the lifetime of a VM:
* We don't free PUD/PMD/PTE tables when they become empty, i.e., not
mapping any pages but still attached. (We could in theory, as
x86/aarch64 do.)
* We have to free PMD/PTE tables when we replace them with 1GB/2MB
pages. (Otherwise we'd lose track of detached tables.) And we
currently don't support huge pages at P4D level, so we never detach
and free PUD tables.