Re: [PATCH 3/3] mm/vmalloc: Sync unmappings in vunmap_page_range()

From: Andy Lutomirski
Date: Thu Jul 18 2019 - 15:05:05 EST


On Thu, Jul 18, 2019 at 2:17 AM Joerg Roedel <jroedel@xxxxxxx> wrote:
>
> Hi Andy,
>
> On Wed, Jul 17, 2019 at 02:24:09PM -0700, Andy Lutomirski wrote:
> > On Wed, Jul 17, 2019 at 12:14 AM Joerg Roedel <joro@xxxxxxxxxx> wrote:
> > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > index 4fa8d84599b0..322b11a374fd 100644
> > > --- a/mm/vmalloc.c
> > > +++ b/mm/vmalloc.c
> > > @@ -132,6 +132,8 @@ static void vunmap_page_range(unsigned long addr, unsigned long end)
> > > continue;
> > > vunmap_p4d_range(pgd, addr, next);
> > > } while (pgd++, addr = next, addr != end);
> > > +
> > > + vmalloc_sync_all();
> > > }
> >
> > I'm confused. Shouldn't the code in _vm_unmap_aliases handle this?
> > As it stands, won't your patch hurt performance on x86_64? If x86_32
> > is a special snowflake here, maybe flush_tlb_kernel_range() should
> > handle this?
>
> Imo this is the logical place to handle this. The code first unmaps the
> area from the init_mm page-table and then syncs that page-table to all
> other page-tables in the system, so one place to update the page-tables.


I find it problematic that there is no meaningful documentation as to
what vmalloc_sync_all() is supposed to do. The closest I can find is
this comment by following the x86_64 code, which calls
sync_global_pgds(), which says:

/*
* When memory was added make sure all the processes MM have
* suitable PGD entries in the local PGD level page.
*/
void sync_global_pgds(unsigned long start, unsigned long end)
{

Which is obviously entirely inapplicable. If I'm understanding
correctly, the underlying issue here is that the vmalloc fault
mechanism can propagate PGD entry *addition*, but nothing (not even
flush_tlb_kernel_range()) propagates PGD entry *removal*.

I find it suspicious that only x86 has this. How do other
architectures handle this?

At the very least, I think this series needs a comment in
vmalloc_sync_all() explaining exactly what the function promises to
do. But maybe a better fix is to add code to flush_tlb_kernel_range()
to sync the vmalloc area if the flushed range overlaps the vmalloc
area. Or, even better, improve x86_32 the way we did x86_64: adjust
the memory mapping code such that top-level paging entries are never
deleted in the first place.