Re: [regression v4.0-rc1] mm: IPIs from TLB flushes causing significant performance degradation.

From: Dave Chinner
Date: Mon Mar 02 2015 - 20:47:41 EST


On Mon, Mar 02, 2015 at 11:47:52AM -0800, Linus Torvalds wrote:
> On Sun, Mar 1, 2015 at 5:04 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> >
> > Across the board the 4.0-rc1 numbers are much slower, and the
> > degradation is far worse when using the large memory footprint
> > configs. Perf points straight at the cause - this is from 4.0-rc1
> > on the "-o bhash=101073" config:
> >
> > - 56.07% 56.07% [kernel] [k] default_send_IPI_mask_sequence_phys
> > - 99.99% physflat_send_IPI_mask
> > - 99.37% native_send_call_func_ipi
> ..
> >
> > And the same profile output from 3.19 shows:
> >
> > - 9.61% 9.61% [kernel] [k] default_send_IPI_mask_sequence_phys
> > - 99.98% physflat_send_IPI_mask
> > - 96.26% native_send_call_func_ipi
> ...
> >
> > So either there's been a massive increase in the number of IPIs
> > being sent, or the cost per IPI have greatly increased. Either way,
> > the result is a pretty significant performance degradatation.
....
> I assume it's the mm queue from Andrew, so adding him to the cc. There
> are changes to the page migration etc, which could explain it.
>
> There are also a fair amount of APIC changes in 4.0-rc1, so I guess it
> really could be just that the IPI sending itself has gotten much
> slower. Adding Ingo for that, although I don't think
> default_send_IPI_mask_sequence_phys() itself hasn't actually changed,
> only other things around the apic. So I'd be inclined to blame the mm
> changes.
>
> Obviously bisection would find it..

Yes, though the time it takes to do a 13 step bisection means it's
something I don't do just for an initial bug report. ;)

Anyway, the difference between good and bad is pretty clear, so
I'm pretty confident the bisect is solid:

4d9424669946532be754a6e116618dcb58430cb4 is the first bad commit
commit 4d9424669946532be754a6e116618dcb58430cb4
Author: Mel Gorman <mgorman@xxxxxxx>
Date: Thu Feb 12 14:58:28 2015 -0800

mm: convert p[te|md]_mknonnuma and remaining page table manipulations

With PROT_NONE, the traditional page table manipulation functions are
sufficient.

[andre.przywara@xxxxxxx: fix compiler warning in pmdp_invalidate()]
[akpm@xxxxxxxxxxxxxxxxxxxx: fix build with STRICT_MM_TYPECHECKS]
Signed-off-by: Mel Gorman <mgorman@xxxxxxx>
Acked-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Acked-by: Aneesh Kumar <aneesh.kumar@xxxxxxxxxxxxxxxxxx>
Tested-by: Sasha Levin <sasha.levin@xxxxxxxxxx>
Cc: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>
Cc: Dave Jones <davej@xxxxxxxxxx>
Cc: Hugh Dickins <hughd@xxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Kirill Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
Cc: Paul Mackerras <paulus@xxxxxxxxx>
Cc: Rik van Riel <riel@xxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>

:040000 040000 50985a3f84e80bb2bdd049d4f34739d99436f988 1bc79bfac2c138844373b603f9bc5914f0d010f3 M arch
:040000 040000 ea69bcd1c59f832a4b012a57b4eb1d0c7516947d 0822692fa6c356952e723b56038585716fa51723 M include
:040000 040000 c11960b9f1ee72edb08dc3fdc46f590fb1d545f7 f5d17ff5b639adcb7363a196a9efe70f2a7312b5 M mm

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/