Re: [RFC][PATCH 03/11] x86/mm: Page size aware flush_tlb_mm_range()

From: Peter Zijlstra
Date: Thu Sep 13 2018 - 14:42:51 EST


On Thu, Sep 13, 2018 at 10:22:58AM -0700, Dave Hansen wrote:
> > +static inline void tlb_flush(struct mmu_gather *tlb)
> > +{
> > + unsigned long start = 0UL, end = TLB_FLUSH_ALL;
> > + unsigned int invl_shift = tlb_get_unmap_shift(tlb);
>
> I had to go back and look at
>
> https://patchwork.kernel.org/patch/10587207/

I so hate patchwork...

> to figure out what was going on. I wonder if we could make the code a
> bit more standalone.
>
> This at least needs a comment about what it's getting from 'tlb'. Maybe
> just:
>
> /* Find the smallest page size that we unmapped: */
>
> > --- a/arch/x86/include/asm/tlbflush.h
> > +++ b/arch/x86/include/asm/tlbflush.h
> > @@ -507,23 +507,25 @@ struct flush_tlb_info {
> > unsigned long start;
> > unsigned long end;
> > u64 new_tlb_gen;
> > + unsigned int invl_shift;
> > };
>
> Maybe we really should just call this flush_stride or something.

But its a shift, not a size. stride_shift?

> > #define local_flush_tlb() __flush_tlb()
> >
> > #define flush_tlb_mm(mm) flush_tlb_mm_range(mm, 0UL, TLB_FLUSH_ALL, 0UL)
> >
> > -#define flush_tlb_range(vma, start, end) \
> > - flush_tlb_mm_range(vma->vm_mm, start, end, vma->vm_flags)
> > +#define flush_tlb_range(vma, start, end) \
> > + flush_tlb_mm_range((vma)->vm_mm, start, end, \
> > + (vma)->vm_flags & VM_HUGETLB ? PMD_SHIFT : PAGE_SHIFT)
>
> This is safe. But, Couldn't this PMD_SHIFT also be PUD_SHIFT for a 1G
> hugetlb page?

It could be, but can we tell at that point?

> > void native_flush_tlb_others(const struct cpumask *cpumask,
> > --- a/arch/x86/mm/tlb.c
> > +++ b/arch/x86/mm/tlb.c
> > @@ -522,12 +522,12 @@ static void flush_tlb_func_common(const
> > f->new_tlb_gen == mm_tlb_gen) {
> > /* Partial flush */
> > unsigned long addr;
> > - unsigned long nr_pages = (f->end - f->start) >> PAGE_SHIFT;
> > + unsigned long nr_pages = (f->end - f->start) >> f->invl_shift;
>
> We might want to make this nr_invalidations or nr_flushes now so we
> don't get it confused with PAGE_SIZE stuff.

Sure, can rename.

> Otherwise, this makes me a *tiny* bit nervous. I think we're good about
> ensuring that we fully flush 4k mappings from the TLB before going up to
> a 2MB mapping because of all the errata we've had there over the years.
> But, had we left 4k mappings around, the old flushing code would have
> cleaned them up for us.

Indeed.

> This certainly tightly ties the invalidations to what was in the page
> tables. If that diverged from the TLB at some point, there's certainly
> more exposure here.
>
> Looks fun, though. :)

:-)