Re: [blk] 8c5035dfbb: fio.read_iops -10.6% regression

From: Yu Kuai
Date: Sat Oct 08 2022 - 04:01:45 EST

Next message: kernel test robot: "[random] 1f9cc6d2c6: BUG:soft_lockup-CPU##stuck_for#s![trinity-c1:#]"
Previous message: yexingchen116: "[PATCH linux-next] PNP: replace !strcmp with sysfs_streq"
Next in thread: Yin Fengwei: "Re: [LKP] Re: [blk] 8c5035dfbb: fio.read_iops -10.6% regression"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi,

在 2022/10/08 10:50, kernel test robot 写道:

Greeting,

FYI, we noticed a -10.6% regression of fio.read_iops due to commit:

I don't know how this is working but I'm *sure* this commit won't affect
performance. Please take a look at the commit, only wbt initialization
is touched, which is done while creating the device:

device_add_disk
blk_register_queue
wbt_enable_default
wbt_init

And io path is the same with or without this commit.

By the way, wbt should only work for write.

Thanks,
Kuai

commit: 8c5035dfbb9475b67c82b3fdb7351236525bf52b ("blk-wbt: call rq_qos_add() after wb_normal is initialized")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: fio-basic
on test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz (Cascade Lake) with 192G memory
with following parameters:

runtime: 300s
nr_task: 8t
disk: 1SSD
fs: btrfs
rw: randread
bs: 2M
ioengine: sync
test_size: 256g
cpufreq_governor: performance

test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
test-url: https://github.com/axboe/fio

Details are as below:

=========================================================================================
bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase:
2M/gcc-11/performance/1SSD/btrfs/sync/x86_64-rhel-8.3/8t/debian-11.1-x86_64-20220510.cgz/300s/randread/lkp-csl-2ap4/256g/fio-basic

commit:
f7de4886fe ("rnbd-srv: remove struct rnbd_dev")
8c5035dfbb ("blk-wbt: call rq_qos_add() after wb_normal is initialized")

f7de4886fe8f008a 8c5035dfbb9475b67c82b3fdb73
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.03 ±106% +0.2 0.22 ± 80% fio.latency_20ms%
0.02 ± 33% -0.0 0.01 ± 12% fio.latency_4ms%
2508 -10.6% 2243 fio.read_bw_MBps
6717440 +17.6% 7897088 fio.read_clat_90%_us
6892202 +19.0% 8202922 fio.read_clat_95%_us
7602176 ± 4% +18.4% 9000277 ± 3% fio.read_clat_99%_us
6374238 +11.8% 7127450 fio.read_clat_mean_us
363825 ± 10% +74.9% 636378 ± 5% fio.read_clat_stddev
1254 -10.6% 1121 fio.read_iops
104.97 +11.8% 117.32 fio.time.elapsed_time
104.97 +11.8% 117.32 fio.time.elapsed_time.max
13731 +5.6% 14498 ± 4% fio.time.maximum_resident_set_size
116.00 -8.2% 106.50 fio.time.percent_of_cpu_this_job_got
1.998e+10 +11.4% 2.226e+10 cpuidle..time
3.27 ± 3% +4.6% 3.42 iostat.cpu.iowait
4.49 ± 68% -2.1 2.38 ±152% perf-profile.children.cycles-pp.number
4.49 ± 68% -2.5 1.98 ±175% perf-profile.self.cycles-pp.number
557763 +5.4% 587781 proc-vmstat.pgfault
25488 +3.1% 26274 proc-vmstat.pgreuse
2459048 -10.1% 2209482 vmstat.io.bi
184649 ± 5% -10.4% 165526 ± 7% vmstat.system.cs
111733 ± 30% +61.8% 180770 ± 21% numa-meminfo.node0.AnonPages
113221 ± 30% +60.2% 181416 ± 21% numa-meminfo.node0.Inactive(anon)
11301 ± 24% +164.5% 29888 ±117% numa-meminfo.node2.Active(file)
104911 ± 39% -80.5% 20456 ±100% numa-meminfo.node3.AnonHugePages
131666 ± 27% -67.9% 42297 ± 82% numa-meminfo.node3.AnonPages
132698 ± 26% -67.5% 43158 ± 81% numa-meminfo.node3.Inactive(anon)
27934 ± 30% +61.8% 45196 ± 21% numa-vmstat.node0.nr_anon_pages
28306 ± 30% +60.2% 45358 ± 21% numa-vmstat.node0.nr_inactive_anon
28305 ± 30% +60.2% 45357 ± 21% numa-vmstat.node0.nr_zone_inactive_anon
6291 ± 24% +68.0% 10567 ± 26% numa-vmstat.node2.workingset_nodes
32925 ± 27% -67.9% 10571 ± 82% numa-vmstat.node3.nr_anon_pages
33182 ± 26% -67.5% 10786 ± 81% numa-vmstat.node3.nr_inactive_anon
33182 ± 26% -67.5% 10786 ± 81% numa-vmstat.node3.nr_zone_inactive_anon
161.78 ± 4% -28.2% 116.10 ± 30% sched_debug.cfs_rq:/.runnable_avg.avg
161.46 ± 4% -28.2% 115.85 ± 30% sched_debug.cfs_rq:/.util_avg.avg
426382 +11.0% 473345 ± 6% sched_debug.cpu.clock.avg
426394 +11.0% 473357 ± 6% sched_debug.cpu.clock.max
426370 +11.0% 473331 ± 6% sched_debug.cpu.clock.min
426139 +10.9% 472586 ± 6% sched_debug.cpu.clock_task.avg
426368 +11.0% 473130 ± 6% sched_debug.cpu.clock_task.max
416196 +11.1% 462228 ± 6% sched_debug.cpu.clock_task.min
1156 ± 7% -10.8% 1031 ± 6% sched_debug.cpu.curr->pid.stddev
426372 +11.0% 473334 ± 6% sched_debug.cpu_clk
425355 +11.0% 472318 ± 6% sched_debug.ktime
426826 +11.0% 473787 ± 6% sched_debug.sched_clk
1.263e+09 -7.9% 1.164e+09 ± 3% perf-stat.i.branch-instructions
190886 ± 5% -10.8% 170290 ± 7% perf-stat.i.context-switches
1.979e+09 -8.8% 1.804e+09 ± 2% perf-stat.i.dTLB-loads
8.998e+08 -8.2% 8.257e+08 ± 2% perf-stat.i.dTLB-stores
6.455e+09 -8.0% 5.938e+09 ± 3% perf-stat.i.instructions
21.78 -8.4% 19.95 perf-stat.i.metric.M/sec
7045315 ± 4% -14.0% 6057863 ± 6% perf-stat.i.node-load-misses
2658563 ± 7% -21.9% 2077647 ± 12% perf-stat.i.node-loads
414822 ± 4% -12.9% 361455 ± 3% perf-stat.i.node-store-misses
1.251e+09 -7.8% 1.154e+09 ± 3% perf-stat.ps.branch-instructions
189082 ± 5% -10.7% 168849 ± 7% perf-stat.ps.context-switches
1.96e+09 -8.8% 1.789e+09 ± 2% perf-stat.ps.dTLB-loads
8.912e+08 -8.1% 8.187e+08 ± 2% perf-stat.ps.dTLB-stores
6.393e+09 -7.9% 5.888e+09 ± 3% perf-stat.ps.instructions
6978485 ± 4% -13.9% 6006510 ± 6% perf-stat.ps.node-load-misses
2633627 ± 7% -21.8% 2060033 ± 12% perf-stat.ps.node-loads
410822 ± 4% -12.8% 358289 ± 3% perf-stat.ps.node-store-misses

If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <yujie.liu@xxxxxxxxx>
| Link: https://lore.kernel.org/r/202210081045.77ddf59b-yujie.liu@xxxxxxxxx

To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.

Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.

Next message: kernel test robot: "[random] 1f9cc6d2c6: BUG:soft_lockup-CPU##stuck_for#s![trinity-c1:#]"
Previous message: yexingchen116: "[PATCH linux-next] PNP: replace !strcmp with sysfs_streq"
Next in thread: Yin Fengwei: "Re: [LKP] Re: [blk] 8c5035dfbb: fio.read_iops -10.6% regression"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]