Re: [hugetlbfs] c0d0381ade: vm-scalability.throughput -33.4% regression

From: Mike Kravetz
Date: Mon Jun 22 2020 - 18:02:01 EST


On 6/21/20 5:55 PM, kernel test robot wrote:
> Greeting,
>
> FYI, we noticed a -33.4% regression of vm-scalability.throughput due to commit:
>
>
> commit: c0d0381ade79885c04a04c303284b040616b116e ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> in testcase: vm-scalability
> on test machine: 288 threads Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz with 80G memory
> with following parameters:
>
> runtime: 300s
> size: 8T
> test: anon-cow-seq-hugetlb
> cpufreq_governor: performance
> ucode: 0x11
>

Some performance regression is not surprising as the change includes acquiring
and holding the i_mmap_rwsem (in read mode) during hugetlb page faults. 33.4%
seems a bit high. But, the test is primarily exercising the hugetlb page
fault path and little else.

The reason for taking the i_mmap_rwsem is to prevent PMD unsharing from
invalidating the pmd we are operating on. This specific test case is operating
on anonymous private mappings. So, PMD sharing is not possible and we can
eliminate acquiring the mutex in this case. In fact, we should check all
mappings (even sharable) for the possibly of PMD sharing and only take the
mutex if necessary. It will make the code a bit uglier, but will take care
of some of these regressions. We still need to take the mutex in the case
of PMD sharing. I'm afraid a regression is unavoidable in that case.

I'll put together a patch.
--
Mike Kravetz