Re: [LKP] [SUNRPC] 0472e47660: fsmark.app_overhead 16.0% regression

From: Xing Zhengjun
Date: Mon Jul 08 2019 - 04:32:37 EST


Hi Trond,

I retest, it still can be reproduced. I test with the following parameters, only change "nr_threads", the test results are as the following. From the test results, more threads in the test, more regression will happen. Could you help to check? Thanks.


In testcase: fsmark
on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with 384G memory
with following parameters:

iterations: 20x
nr_threads: 1t
disk: 1BRD_48G
fs: xfs
fs2: nfsv4
filesize: 4M
test_size: 80G
sync_method: fsyncBeforeClose
cpufreq_governor: performance

test-description: The fsmark is a file system benchmark to test synchronous write workloads, for example, mail servers workload.
test-url: https://sourceforge.net/projects/fsmark/

commit:
e791f8e938 ("SUNRPC: Convert xs_send_kvec() to use iov_iter_kvec()")
0472e47660 ("SUNRPC: Convert socket page send code to use iov_iter()")

e791f8e9380d945e 0472e476604998c127f3c80d291
---------------- ---------------------------
%stddev %change %stddev
\ | \
59.74 -0.7% 59.32 fsmark.files_per_sec (nr_threads= 1)
114.06 -8.1% 104.83 fsmark.files_per_sec (nr_threads= 2)
184.53 -13.1% 160.29 fsmark.files_per_sec (nr_threads= 4)
257.05 -15.5% 217.22 fsmark.files_per_sec (nr_threads= 8)
306.08 -15.5% 258.68 fsmark.files_per_sec (nr_threads=16)
498.34 -22.7% 385.33 fsmark.files_per_sec (nr_threads=32)
527.29 -22.6% 407.96 fsmark.files_per_sec (nr_threads=64)



On 5/31/2019 11:27 AM, Xing Zhengjun wrote:


On 5/31/2019 3:10 AM, Trond Myklebust wrote:
On Thu, 2019-05-30 at 15:20 +0800, Xing Zhengjun wrote:

On 5/30/2019 10:00 AM, Trond Myklebust wrote:
Hi Xing,

On Thu, 2019-05-30 at 09:35 +0800, Xing Zhengjun wrote:
Hi Trond,

On 5/20/2019 1:54 PM, kernel test robot wrote:
Greeting,

FYI, we noticed a 16.0% improvement of fsmark.app_overhead due
to
commit:


commit: 0472e476604998c127f3c80d291113e77c5676ac ("SUNRPC:
Convert
socket page send code to use iov_iter()")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git
master

in testcase: fsmark
on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @
3.00GHz with 384G memory
with following parameters:

ÂÂÂÂiterations: 1x
ÂÂÂÂnr_threads: 64t
ÂÂÂÂdisk: 1BRD_48G
ÂÂÂÂfs: xfs
ÂÂÂÂfs2: nfsv4
ÂÂÂÂfilesize: 4M
ÂÂÂÂtest_size: 40G
ÂÂÂÂsync_method: fsyncBeforeClose
ÂÂÂÂcpufreq_governor: performance

test-description: The fsmark is a file system benchmark to test
synchronous write workloads, for example, mail servers
workload.
test-url: https://sourceforge.net/projects/fsmark/



Details are as below:
-------------------------------------------------------------
----
--------------------------------->


To reproduce:

ÂÂÂÂÂÂÂÂÂÂ git clone https://github.com/intel/lkp-tests.git
ÂÂÂÂÂÂÂÂÂÂ cd lkp-tests
 bin/lkp install job.yaml # job file is attached in
this
email
ÂÂÂÂÂÂÂÂÂÂ bin/lkp runÂÂÂÂ job.yaml

===============================================================
====
======================
compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconf
ig/n
r_threads/rootfs/sync_method/tbox_group/test_size/testcase:
ÂÂÂÂ gcc-7/performance/1BRD_48G/4M/nfsv4/xfs/1x/x86_64-rhel-
7.6/64t/debian-x86_64-2018-04-03.cgz/fsyncBeforeClose/lkp-ivb-
ep01/40G/fsmark

commit:
ÂÂÂÂ e791f8e938 ("SUNRPC: Convert xs_send_kvec() to use
iov_iter_kvec()")
ÂÂÂÂ 0472e47660 ("SUNRPC: Convert socket page send code to use
iov_iter()")

e791f8e9380d945e 0472e476604998c127f3c80d291
---------------- ---------------------------
 fail:runs %reproduction fail:runs
ÂÂÂÂÂÂÂÂÂÂÂÂÂ |ÂÂÂÂÂÂÂÂÂÂÂÂ |ÂÂÂÂÂÂÂÂÂÂÂÂ |
ÂÂÂÂÂÂÂÂÂÂÂÂÂ :4ÂÂÂÂÂÂÂÂÂÂ 50%ÂÂÂÂÂÂÂÂÂÂ 2:4ÂÂÂÂ dmesg.WARNING:a
t#for
_ip_interrupt_entry/0x
ÂÂÂÂÂÂÂÂÂÂÂ %stddevÂÂÂÂ %changeÂÂÂÂÂÂÂÂ %stddev
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ \ÂÂÂÂÂÂÂÂÂ |ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ \
ÂÂÂÂ 15118573
ÂÂ 2%ÂÂÂÂ +16.0%ÂÂ 17538083ÂÂÂÂÂÂÂ fsmark.app_overhead
ÂÂÂÂÂÂ 510.93ÂÂÂÂÂÂÂÂÂÂ -
22.7%ÂÂÂÂ 395.12ÂÂÂÂÂÂÂ fsmark.files_per_sec
ÂÂÂÂÂÂÂ 24.90ÂÂÂÂÂÂÂÂÂÂ +22.8%ÂÂÂÂÂ 30.57ÂÂÂÂÂÂÂ fsmark.time.ela
psed_
time
ÂÂÂÂÂÂÂ 24.90ÂÂÂÂÂÂÂÂÂÂ +22.8%ÂÂÂÂÂ 30.57ÂÂÂÂÂÂÂ fsmark.time.ela
psed_
time.max
ÂÂÂÂÂÂ 288.00 ÂÂ 2%ÂÂÂÂ -
27.8%ÂÂÂÂ 208.00ÂÂÂÂÂÂÂ fsmark.time.percent_of_cpu_this_job_got
ÂÂÂÂÂÂÂ 70.03 ÂÂ 2%ÂÂÂÂ -
11.3%ÂÂÂÂÂ 62.14ÂÂÂÂÂÂÂ fsmark.time.system_time


Do you have time to take a look at this regression?

 From your stats, it looks to me as if the problem is increased
NUMA
overhead. Pretty much everything else appears to be the same or
actually performing better than previously. Am I interpreting that
correctly?
The real regression is the throughput(fsmark.files_per_sec) is
decreased
by 22.7%.

Understood, but I'm trying to make sense of why. I'm not able to
reproduce this, so I have to rely on your performance stats to
understand where the 22.7% regression is coming from. As far as I can
see, the only numbers in the stats you published that are showing a
performance regression (other than the fsmark number itself), are the
NUMA numbers. Is that a correct interpretation?

We re-test the case yesterday, the test result almost is the same.
we will do more test and also check the test case itself, if you need
more information, please let me know, thanks.

If my interpretation above is correct, then I'm not seeing where
this
patch would be introducing new NUMA regressions. It is just
converting
from using one method of doing socket I/O to another. Could it
perhaps
be a memory artefact due to your running the NFS client and server
on
the same machine?

Apologies for pushing back a little, but I just don't have the
hardware available to test NUMA configurations, so I'm relying on
external testing for the above kind of scenario.

Thanks for looking at this. If you need more information, please let
me
know.
Thanks
ÂÂÂ Trond



--
Zhengjun Xing