Re: [PATCH v2 6/6] mm: Introduce ctor/dtor at PGD level

From: Alexander Gordeev
Date: Wed Jan 22 2025 - 16:19:56 EST


On Wed, Jan 22, 2025 at 03:06:05PM +0100, Alexander Gordeev wrote:

Hi Kevin,

> On Wed, Jan 22, 2025 at 08:49:54AM +0100, Heiko Carstens wrote:
> > > > static inline pgd_t *pgd_alloc(struct mm_struct *mm)
> > > > {
> > > > - return (pgd_t *) crst_table_alloc(mm);
> > > > + unsigned long *table = crst_table_alloc(mm);
> > > > +
> > > > + if (!table)
> > > > + return NULL;
> > >
> > > I do not know status of this series, but FWIW, this call is missed:
> > >
> > > crst_table_init(table, _REGION1_ENTRY_EMPTY);
> >
> > Why is that missing?
>
> Because the follow-up pagetable_pgd_ctor() is called against uninitialized
> page table, while other pagetable_pXd_ctor() variants are called against
> initialized one. I could imagine complications as result of that.
>
> Whether Region1 table is the right choice is a big question though, as you
> noticed below.

As discussed with Heiko, we do not want to add the extra crst_table_init() call
at least due to performance impact. So please ignore my comment.

> > A pgd table can be a Region1, Region2, or Region3 table. The only caller of
> > this function is mm_init() via mm_alloc_pgd(); and right after mm_alloc_pgd()
> > there is a call to init_new_context() which will initialize the pgd correctly.
>
> init_new_context() is in a way a constructor as well, so whole thing looks odd
> to me. But I do not immedeately see a better way :(
>
> > I guess what really gets odd, and might be broken (haven't checked yet) is
> > what happens on dynamic upgrade of page table levels (->crst_table_upgrade()).
>
> Hmm, that is a good point.
>
> > With that a pgd may become a pud, and with that we get an imbalance with
> > the ctor/dtor calls for the various page table levels when they get freed
>
> The ctor/dtor mismatch should not be a problem, as pagetable_pgd|p4d|pud_ctor()
> are the same and there is one pagetable_dtor() for all top levels as of now.
> But if it ever comes to separate implementations, then we are in the world
> of pain.
>
> > again. Plus, at first glance, it looks also broken that we have open-coded
> > crst_alloc() calls instead of using the "proper" page table allocation API
> > within crst_table_upgrade(), which again would cause an imbalance.
>
> This is a good point too.

The below bits are seems to be missed. We will test it and send a patch.

diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index a4e761902093..d33f55b7ee98 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -88,12 +88,14 @@ int crst_table_upgrade(struct mm_struct *mm, unsigned long end)
if (unlikely(!p4d))
goto err_p4d;
crst_table_init(p4d, _REGION2_ENTRY_EMPTY);
+ pagetable_p4d_ctor(virt_to_ptdesc(p4d));
}
if (end > _REGION1_SIZE) {
pgd = crst_table_alloc(mm);
if (unlikely(!pgd))
goto err_pgd;
crst_table_init(pgd, _REGION1_ENTRY_EMPTY);
+ pagetable_pgd_ctor(virt_to_ptdesc(pgd));
}

spin_lock_bh(&mm->page_table_lock);
@@ -130,6 +132,7 @@ int crst_table_upgrade(struct mm_struct *mm, unsigned long end)
return 0;

err_pgd:
+ pagetable_dtor(virt_to_ptdesc(p4d));
crst_table_free(mm, p4d);
err_p4d:
return -ENOMEM;

Thanks!