Re: [mm/page_alloc] 8212a964ee: vm-scalability.throughput 30.5% improvement

From: Eric Dumazet
Date: Sat Mar 12 2022 - 18:28:41 EST

Next message: kernel test robot: "[cel:nfsd-courteous-server 29/39] do_mounts.c:(.text+0x66): multiple definition of `locks_owner_has_blockers'; init/main.o:main.c:(.text+0x0): first defined here"
Previous message: Miles Chen: "Re: [PATCH v3 15/15] clk: mediatek: Add MT8186 ipesys clock support"
In reply to: Vlastimil Babka: "Re: [mm/page_alloc] 8212a964ee: vm-scalability.throughput 30.5% improvement"
Next in thread: Vlastimil Babka: "Re: [mm/page_alloc] 8212a964ee: vm-scalability.throughput 30.5% improvement"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sat, Mar 12, 2022 at 10:59 AM Vlastimil Babka <vbabka@xxxxxxx> wrote:
>
> On 3/12/22 16:43, kernel test robot wrote:
> >
> >
> > Greeting,
> >
> > FYI, we noticed a 30.5% improvement of vm-scalability.throughput due to commit:
> >
> >
> > commit: 8212a964ee020471104e34dce7029dec33c218a9 ("Re: [PATCH v2] mm/page_alloc: call check_new_pages() while zone spinlock is not held")
> > url: https://github.com/0day-ci/linux/commits/Mel-Gorman/Re-PATCH-v2-mm-page_alloc-call-check_new_pages-while-zone-spinlock-is-not-held/20220309-203504
> > patch link: https://lore.kernel.org/lkml/20220309123245.GI15701@xxxxxxxxxxxxxxxxxxx
>
> Heh, that's weird. I would expect some improvement from Eric's patch,
> but this seems to be actually about Mel's "mm/page_alloc: check
> high-order pages for corruption during PCP operations" applied directly
> on 5.17-rc7 per the github url above. This was rather expected to make
> performance worse if anything, so maybe the improvement is due to some
> unexpected side-effect of different inlining decisions or cache alignment...
>

I doubt this has anything to do with inlining or cache alignment.

I am not familiar with the benchmark, but its name
(anon-w-rand-hugetlb) hints at hugetlb ?

After Mel fix, we go over 512 'struct page' to perform sanity checks,
thus loading into cpu caches the 512 cache lines.

This caching is done while no lock is held.

If after this huge page allocation some mm operation needs to access
these 512 struct pages,
while holding a lock, then sure, there is a huge gain.

Next message: kernel test robot: "[cel:nfsd-courteous-server 29/39] do_mounts.c:(.text+0x66): multiple definition of `locks_owner_has_blockers'; init/main.o:main.c:(.text+0x0): first defined here"
Previous message: Miles Chen: "Re: [PATCH v3 15/15] clk: mediatek: Add MT8186 ipesys clock support"
In reply to: Vlastimil Babka: "Re: [mm/page_alloc] 8212a964ee: vm-scalability.throughput 30.5% improvement"
Next in thread: Vlastimil Babka: "Re: [mm/page_alloc] 8212a964ee: vm-scalability.throughput 30.5% improvement"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]