Re: [lkp-robot] [mm, vmscan] 5e56dfbd83: fsmark.files_per_sec -11.1% regression

From: Michal Hocko
Date: Thu Feb 23 2017 - 02:36:18 EST


On Thu 23-02-17 09:27:34, Ye Xiaolong wrote:
> Hi, Michal
>
> On 02/07, Michal Hocko wrote:
> [snip]
> >Could you retest with a single NUMA node? I am not familiar with the
> >benchmark enough to judge it was set up properly for a NUMA machine.
>
> I've retested the commit with a single NUMA node via "numactl -m 0 fs_mark xxx",
> and it did help recover the performance back.

Thanks for restesting! get_scan_count which was
>
> Here is the comparison:
>
> commit/compiler/cpufreq_governor/disk/filesize/fs/iterations/kconfig/md/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
> 5e56dfbd837421b7fa3c6c06018c6701e2704917/gcc-6/performance/3HDD/4M/btrfs/1/x86_64-rhel-7.2/RAID5/64/debian-x86_64-2016-08-31.cgz/NoSync/ivb44/130G/fsmark
>
> (with a single NUMA node) (2 NUMA nodes)
> --------------------------------------------------------------------
> fail:runs %reproduction fail:runs
> | | |
> %stddev %change %stddev
> \ | \
> 57.60 ± 0% -11.1% 51.20 ± 0% fsmark.files_per_sec
> 607.84 ± 0% +9.0% 662.24 ± 1% fsmark.time.elapsed_time
> 607.84 ± 0% +9.0% 662.24 ± 1% fsmark.time.elapsed_time.max
> 14317 ± 6% -12.2% 12568 ± 7% fsmark.time.involuntary_context_switches
> 1864 ± 0% +0.5% 1873 ± 0% fsmark.time.maximum_resident_set_size
> 12425 ± 0% +23.3% 15320 ± 3% fsmark.time.minor_page_faults
> 33.00 ± 3% -33.9% 21.80 ± 1% fsmark.time.percent_of_cpu_this_job_got
> 203.49 ± 3% -28.1% 146.31 ± 1% fsmark.time.system_time
> 605701 ± 0% +3.6% 627486 ± 0% fsmark.time.voluntary_context_switches
> 307106 ± 2% +20.2% 368992 ± 9% interrupts.CAL:Function_call_interrupts
> 183040 ± 0% +23.2% 225559 ± 3% softirqs.BLOCK
> 12203 ± 57% +236.4% 41056 ±103% softirqs.NET_RX
> 186118 ± 0% +21.9% 226922 ± 2% softirqs.TASKLET
> 14317 ± 6% -12.2% 12568 ± 7% time.involuntary_context_switches
> 12425 ± 0% +23.3% 15320 ± 3% time.minor_page_faults
> 33.00 ± 3% -33.9% 21.80 ± 1% time.percent_of_cpu_this_job_got
> 203.49 ± 3% -28.1% 146.31 ± 1% time.system_time
> 3.47 ± 3% -13.0% 3.02 ± 1% turbostat.%Busy
> 99.60 ± 1% -9.6% 90.00 ± 1% turbostat.Avg_MHz
> 78.69 ± 1% +1.7% 80.01 ± 0% turbostat.CorWatt
> 3.56 ± 61% -91.7% 0.30 ± 76% turbostat.Pkg%pc2
> 207790 ± 0% -8.2% 190654 ± 1% vmstat.io.bo
> 30667691 ± 0% +65.9% 50890669 ± 1% vmstat.memory.cache
> 34549892 ± 0% -58.4% 14378939 ± 4% vmstat.memory.free
> 6768 ± 0% -1.3% 6681 ± 1% vmstat.system.cs
> 1.089e+10 ± 2% +13.4% 1.236e+10 ± 3% cpuidle.C1E-IVT.time
> 11475304 ± 2% +13.4% 13007849 ± 3% cpuidle.C1E-IVT.usage
> 2.7e+09 ± 6% +13.2% 3.057e+09 ± 3% cpuidle.C3-IVT.time
> 2954294 ± 6% +14.3% 3375966 ± 3% cpuidle.C3-IVT.usage
> 96963295 ± 14% +17.5% 1.139e+08 ± 12% cpuidle.POLL.time
> 8761 ± 7% +17.6% 10299 ± 9% cpuidle.POLL.usage
> 30454483 ± 0% +66.4% 50666102 ± 1% meminfo.Cached
>
> Do you see what's happening?

not really. All I could see in the previous data was that the memory
locality was different (and better) with my patch, which I cannot
explain either because get_scan_count is always per-node thing. Moreover
the change shouldn't make any difference for normal GFP_KERNEL requests
on 64b systems because the reclaim index covers all zones so there is
nothing to skip over.

> Or is there anything we can do to improve fsmark benchmark setup to
> make it more reasonable?

Unfortunatelly I am not an expert on this benchmark. Maybe Mel knows
better.
--
Michal Hocko
SUSE Labs