Re: [RFC PATCH V1 2/6] sched/numa: Add disjoint vma unconditional scan logic

From: Raghavendra K T
Date: Wed Sep 13 2023 - 02:22:14 EST


On 9/12/2023 1:20 PM, kernelt test robot wrote:


Hello,

kernel test robot noticed a -11.9% improvement of autonuma-benchmark.numa01_THREAD_ALLOC.seconds on:


commit: 1ef5cbb92bdb320c5eb9fdee1a811d22ee9e19fe ("[RFC PATCH V1 2/6] sched/numa: Add disjoint vma unconditional scan logic")
url: https://github.com/intel-lab-lkp/linux/commits/Raghavendra-K-T/sched-numa-Move-up-the-access-pid-reset-logic/20230829-141007
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 2f88c8e802c8b128a155976631f4eb2ce4f3c805
patch link: https://lore.kernel.org/all/87e3c08bd1770dd3e6eee099c01e595f14c76fc3.1693287931.git.raghavendra.kt@xxxxxxx/
patch subject: [RFC PATCH V1 2/6] sched/numa: Add disjoint vma unconditional scan logic

testcase: autonuma-benchmark
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
parameters:

iterations: 4x
test: numa01_THREAD_ALLOC
cpufreq_governor: performance


hi, Raghu,

the reason there is a separate report for this commit besides
https://lore.kernel.org/all/202309102311.84b42068-oliver.sang@xxxxxxxxx/
is due to bisection nature, for one auto-bisect, we so far only could capture
one commit for performance change.

this auto-bisect is running on another test machine (Sapphire Rapids), and it
happened to choose autonuma-benchmark.numa01_THREAD_ALLOC.seconds as indicator
to do the bisect, it finally captured
"[RFC PATCH V1 2/6] sched/numa: Add disjoint vma unconditional"

and from
https://lore.kernel.org/all/acf254e9-0207-7030-131f-8a3f520c657b@xxxxxxx/
I noticed you care more about the performance impact of whole patch set,
so let me give a summary table as below.

firstly, let me give out how we apply your patch again:

68cfe9439a1ba (linux-review/Raghavendra-K-T/sched-numa-Move-up-the-access-pid-reset-logic/20230829-141007) sched/numa: Allow scanning of shared VMAs
af46f3c9ca2d1 sched/numa: Allow recently accessed VMAs to be scanned
167773d1ddb5f sched/numa: Increase tasks' access history
fc769221b2306 sched/numa: Remove unconditional scan logic using mm numa_scan_seq
1ef5cbb92bdb3 sched/numa: Add disjoint vma unconditional scan logic
2a806eab1c2e1 sched/numa: Move up the access pid reset logic
2f88c8e802c8b (tip/sched/core) sched/eevdf/doc: Modify the documented knob to base_slice_ns as well


we have below data on this test machine
(full table will be very big, if you want it, please let me know):

=========================================================================================
compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase:
gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-spr-r02/numa01_THREAD_ALLOC/autonuma-benchmark

commit:
2f88c8e802 ("(tip/sched/core) sched/eevdf/doc: Modify the documented knob to base_slice_ns as well")
2a806eab1c ("sched/numa: Move up the access pid reset logic")
1ef5cbb92b ("sched/numa: Add disjoint vma unconditional scan logic")
68cfe9439a ("sched/numa: Allow scanning of shared VMAs")


2f88c8e802c8b128 2a806eab1c2e1c9f0ae39dc0307 1ef5cbb92bdb320c5eb9fdee1a8 68cfe9439a1baa642e05883fa64
---------------- --------------------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev %change %stddev
\ | \ | \ | \
271.01 +0.8% 273.24 -0.7% 269.00 -26.4% 199.49 ± 3% autonuma-benchmark.numa01.seconds
76.28 +0.2% 76.44 -11.7% 67.36 ± 6% -46.9% 40.49 ± 5% autonuma-benchmark.numa01_THREAD_ALLOC.seconds
8.11 -0.9% 8.04 -0.7% 8.05 -0.1% 8.10 autonuma-benchmark.numa02.seconds
1425 +0.7% 1434 -3.1% 1381 -30.1% 996.02 ± 2% autonuma-benchmark.time.elapsed_time



Thanks for this Summary too.

I think slight additional time overhead from first patch is coming
from additional logic that gets executed before we return from
is_vma_accessed() check as expected.

Regards
- Raghu