Re: [mm, thp] 85b9f46e8e: vm-scalability.throughput -8.7% regression
From: Huang, Ying
Date: Tue Oct 20 2020 - 20:42:04 EST
David Rientjes <rientjes@xxxxxxxxxx> writes:
> On Tue, 20 Oct 2020, Huang, Ying wrote:
>
>> >> =========================================================================================
>> >> compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase/ucode:
>> >> gcc-9/performance/x86_64-rhel-8.3/debian-10.4-x86_64-20200603.cgz/300s/1T/lkp-skl-fpga01/lru-shm/vm-scalability/0x2006906
>> >>
>> >> commit:
>> >> dcdf11ee14 ("mm, shmem: add vmstat for hugepage fallback")
>> >> 85b9f46e8e ("mm, thp: track fallbacks due to failed memcg charges separately")
>> >>
>> >> dcdf11ee14413332 85b9f46e8ea451633ccd60a7d8c
>> >> ---------------- ---------------------------
>> >> fail:runs %reproduction fail:runs
>> >> | | |
>> >> 1:4 24% 2:4 perf-profile.calltrace.cycles-pp.sync_regs.error_entry.do_access
>> >> 3:4 53% 5:4 perf-profile.calltrace.cycles-pp.error_entry.do_access
>> >> 9:4 -27% 8:4 perf-profile.children.cycles-pp.error_entry
>> >> 4:4 -10% 4:4 perf-profile.self.cycles-pp.error_entry
>> >> %stddev %change %stddev
>> >> \ | \
>> >> 477291 -9.1% 434041 vm-scalability.median
>> >> 49791027 -8.7% 45476799 vm-scalability.throughput
>> >> 223.67 +1.6% 227.36 vm-scalability.time.elapsed_time
>> >> 223.67 +1.6% 227.36 vm-scalability.time.elapsed_time.max
>> >> 50364 ± 6% +24.1% 62482 ± 10% vm-scalability.time.involuntary_context_switches
>> >> 2237 +7.8% 2412 vm-scalability.time.percent_of_cpu_this_job_got
>> >> 3084 +18.2% 3646 vm-scalability.time.system_time
>> >> 1921 -4.2% 1839 vm-scalability.time.user_time
>> >> 13.68 +2.2 15.86 mpstat.cpu.all.sys%
>> >> 28535 ± 30% -47.0% 15114 ± 79% numa-numastat.node0.other_node
>> >> 142734 ± 11% -19.4% 115000 ± 17% numa-meminfo.node0.AnonPages
>> >> 11168 ± 3% +8.8% 12150 ± 5% numa-meminfo.node1.PageTables
>> >> 76.00 -1.6% 74.75 vmstat.cpu.id
>> >> 3626 -1.9% 3555 vmstat.system.cs
>> >> 2214928 ±166% -96.6% 75321 ± 7% cpuidle.C1.usage
>> >> 200981 ± 7% -18.0% 164861 ± 7% cpuidle.POLL.time
>> >> 52675 ± 3% -16.7% 43866 ± 10% cpuidle.POLL.usage
>> >> 35659 ± 11% -19.4% 28754 ± 17% numa-vmstat.node0.nr_anon_pages
>> >> 1248014 ± 3% +10.9% 1384236 numa-vmstat.node1.nr_mapped
>> >> 2722 ± 4% +10.6% 3011 ± 5% numa-vmstat.node1.nr_page_table_pages
>> >
>> > I'm not sure that I'm reading this correctly, but I suspect that this just
>> > happens because of NUMA: memory affinity will obviously impact
>> > vm-scalability.throughput quite substantially, but I don't think the
>> > bisected commit can be to be blame. Commit 85b9f46e8ea4 ("mm, thp: track
>> > fallbacks due to failed memcg charges separately") simply adds new
>> > count_vm_event() calls in a couple areas to track thp fallback due to
>> > memcg limits separate from fragmentation.
>> >
>> > It's likely a question about the testing methodology in general: for
>> > memory intensive benchmarks, I suggest it is configured in a manner that
>> > we can expect consistent memory access latency at the hardware level when
>> > running on a NUMA system.
>>
>> So you think it's better to bind processes to NUMA node or CPU? But we
>> want to use this test case to capture NUMA/CPU placement/balance issue
>> too.
>>
>
> No, because binding to a specific socket may cause other performance
> "improvements" or "degradations" depending on how fragmented local memory
> is, or whether or not it's under memory pressure. Is the system rebooted
> before testing so that we have a consistent state of memory availability
> and fragmentation across sockets?
Yes. System is rebooted before testing (0day uses kexec to accelerate
rebooting).
>> 0day solve the problem in another way. We run the test case
>> multiple-times and calculate the average and standard deviation, then
>> compare.
>>
>
> Depending on fragmentation or memory availability, any benchmark that
> assesses performance may be adversely affected if its results can be
> impacted by hugepage backing.
Best Regards,
Huang, Ying