Re: [linux-next:master] [mm/hugetlb_vmemmap] 875fa64577: vm-scalability.throughput -34.3% regression

From: Yu Zhao
Date: Sat Aug 03 2024 - 18:08:45 EST


Hi Oliver,

On Fri, Jul 19, 2024 at 10:06 AM Yu Zhao <yuzhao@xxxxxxxxxx> wrote:
>
> On Fri, Jul 19, 2024 at 2:44 AM Oliver Sang <oliver.sang@xxxxxxxxx> wrote:
> >
> > hi, Yu Zhao,
> >
> > On Wed, Jul 17, 2024 at 09:44:33AM -0600, Yu Zhao wrote:
> > > On Wed, Jul 17, 2024 at 2:36 AM Yu Zhao <yuzhao@xxxxxxxxxx> wrote:
> > > >
> > > > Hi Janosch and Oliver,
> > > >
> > > > On Wed, Jul 17, 2024 at 1:57 AM Janosch Frank <frankja@xxxxxxxxxxxxx> wrote:
> > > > >
> > > > > On 7/9/24 07:11, kernel test robot wrote:
> > > > > > Hello,
> > > > > >
> > > > > > kernel test robot noticed a -34.3% regression of vm-scalability.throughput on:
> > > > > >
> > > > > >
> > > > > > commit: 875fa64577da9bc8e9963ee14fef8433f20653e7 ("mm/hugetlb_vmemmap: fix race with speculative PFN walkers")
> > > > > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > > > > >
> > > > > > [still regression on linux-next/master 0b58e108042b0ed28a71cd7edf5175999955b233]
> > > > > >
> > > > > This has hit s390 huge page backed KVM guests as well.
> > > > > Our simple start/stop test case went from ~5 to over 50 seconds of runtime.
> > > >
> > > > Could you try the attached patch please? Thank you.
> > >
> > > Thanks, Yosry, for spotting the following typo:
> > > flags &= VMEMMAP_SYNCHRONIZE_RCU;
> > > It's supposed to be:
> > > flags &= ~VMEMMAP_SYNCHRONIZE_RCU;
> > >
> > > Reattaching v2 with the above typo fixed. Please let me know, Janosch & Oliver.
> >
> > since the commit is in mainline now, I directly apply your v2 patch upon
> > bd225530a4c71 ("mm/hugetlb_vmemmap: fix race with speculative PFN walkers")
> >
> > in our tests, your v2 patch not only recovers the performance regression,
>
> Thanks for verifying the fix!
>
> > it even has +13.7% performance improvement than 5a4d8944d6b1e (parent of
> > bd225530a4c71)
>
> Glad to hear!
>
> (The original patch improved and regressed the performance at the same
> time, but the regression is bigger. The fix removed the regression and
> surfaced the improvement.)

Can you please run the benchmark again with the attached patch on top
of the last fix?

I spotted something else worth optimizing last time, and with the
patch attached, I was able to measure some significant improvements in
1GB hugeTLB allocation and free time, e.g., when allocating and free
700 1GB hugeTLB pages:

Before:
# time echo 700 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
real 0m13.500s
user 0m0.000s
sys 0m13.311s

# time echo 0 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
real 0m11.269s
user 0m0.000s
sys 0m11.187s


After:
# time echo 700 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
real 0m10.643s
user 0m0.001s
sys 0m10.487s

# time echo 0 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
real 0m1.541s
user 0m0.000s
sys 0m1.528s

Thanks!

Attachment: hugetlb.patch
Description: Binary data