On 7/15/2020 7:04 PM, Ritesh Harjani wrote:
Hello Xing,
On 4/7/20 1:30 PM, kernel test robot wrote:
Greeting,
FYI, we noticed a -60.5% regression of stress-ng.fiemap.ops_per_sec due to commit:
commit: d3b6f23f71670007817a5d59f3fbafab2b794e8c ("ext4: move ext4_fiemap to use iomap framework")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: stress-ng
on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory
with following parameters:
nr_threads: 10%
disk: 1HDD
testtime: 1s
class: os
cpufreq_governor: performance
ucode: 0x500002c
fs: ext4
I started looking into this issue. But with my unit testing, I didn't
find any perf issue with fiemap ioctl call. I haven't yet explored about
how stress-ng take fiemap performance numbers, it could be doing
something differently. But in my testing I just made sure to create a
file with large number of extents and used xfs_io -c "fiemap -v" cmd
to check how much time it takes to read all the entries in 1st
and subsequent iterations.
Setup comprised of qemu machine on x86_64 with latest linux branch.
1. created a file of 10G using fallocate. (this allocated unwritten
extents for this file).
2. Then I punched hole on every alternate block of file. This step took
a long time. And after sufficiently long time, I had to cancel it.
for i in $(seq 1 2 xxxxx); do echo $i; fallocate -p -o $(($i*4096)) -l 4096; done
3. Then issued fiemap call via xfs_io and took the time measurement.
time xfs_io -c "fiemap -v" bigfile > /dev/null
Perf numbers on latest default kernel build for above cmd.
1st iteration
==============
real 0m31.684s
user 0m1.593s
sys 0m24.174s
2nd and subsequent iteration
============================
real 0m3.379s
user 0m1.300s
sys 0m2.080s
4. Then I reverted all the iomap_fiemap patches and re-tested this.
With this the older ext4_fiemap implementation will be tested:-
1st iteration
==============
real 0m31.591s
user 0m1.400s
sys 0m24.243s
2nd and subsequent iteration (had to cancel it since it was taking more time then 15m)
============================
^C^C
real 15m49.884s
user 0m0.032s
sys 15m49.722s
I guess the reason why 2nd iteration with older implementation is taking
too much time is since with previous implementation we never cached
extent entries in extent_status tree. And also in 1st iteration the page
cache may get filled with lot of buffer_head entries. So maybe page
reclaims are taking more time.
While with the latest implementation using iomap_fiemap(), the call to query extent blocks is done using ext4_map_blocks(). ext4_map_blocks()
by default will also cache the extent entries into extent_status tree.
Hence during 2nd iteration, we will directly read the entries from extent_status tree and will not do any disk I/O.
-ritesh
Could you try stress-ng( https://kernel.ubuntu.com/~cking/stress-ng/) test cases? The tarballs can be get from https://kernel.ubuntu.com/~cking/tarballs/stress-ng/.
The command for this case you can try "stress-ng --timeout 1 --times --verify --metrics-brief --sequential 9 --class os --minimize --exclude spawn,exec,swap"
I re-test it on the v5.8-rc6, the regression still existed.
=========================================================================================
tbox_group/testcase/rootfs/kconfig/compiler/debug-setup/nr_threads/disk/testtime/fs/class/cpufreq_governor/ucode:
lkp-csl-2sp5/stress-ng/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/test/10%/1HDD/1s/ext4/os/performance/0x5002f01
commit:
b2c5764262edded1b1cfff5a6ca82c3d61bb4a4a
d3b6f23f71670007817a5d59f3fbafab2b794e8c
v5.8-rc6
b2c5764262edded1 d3b6f23f71670007817a5d59f3f v5.8-rc6
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
20419 ± 3% -4.9% 19423 ± 4% +27.1% 25959 stress-ng.af-alg.ops
19655 ± 3% -5.7% 18537 ± 4% +27.8% 25111 stress-ng.af-alg.ops_per_sec
64.67 ± 5% -17.0% 53.67 ± 38% +22.2% 79.00 ± 9% stress-ng.chdir.ops
55.34 ± 3% -13.3% 47.99 ± 38% +26.4% 69.96 ± 10% stress-ng.chdir.ops_per_sec
64652 ± 7% -14.1% 55545 ± 11% -13.6% 55842 ± 6% stress-ng.chown.ops
64683 ± 7% -14.1% 55565 ± 11% -13.6% 55858 ± 6% stress-ng.chown.ops_per_sec
2805 ± 2% +0.6% 2820 ± 2% +130.0% 6452 stress-ng.clone.ops
2802 ± 2% +0.6% 2818 ± 2% +129.9% 6443 stress-ng.clone.ops_per_sec
34.67 +1.9% 35.33 ± 3% -9.6% 31.33 ± 3% stress-ng.copy-file.ops
22297 ± 23% +26.7% 28258 ± 2% +38.1% 30783 ± 14% stress-ng.dir.ops_per_sec
4743 ± 7% -12.6% 4147 ± 4% -19.9% 3801 ± 14% stress-ng.dirdeep.ops_per_sec
24499 ± 8% -9.3% 22223 ± 3% -19.8% 19647 ± 2% stress-ng.enosys.ops
24494 ± 8% -9.3% 22207 ± 3% -19.8% 19637 ± 2% stress-ng.enosys.ops_per_sec
76618 -63.8% 27756 ± 5% -55.5% 34092 ± 2% stress-ng.fiemap.ops
76042 -63.8% 27492 ± 5% -55.4% 33899 ± 2% stress-ng.fiemap.ops_per_sec
3816791 -6.2% 3581880 ± 2% -10.7% 3409584 ± 3% stress-ng.file-ioctl.ops
3817421 -6.2% 3582248 ± 2% -10.7% 3409544 ± 3% stress-ng.file-ioctl.ops_per_sec
27332 ± 17% +30.2% 35574 +37.2% 37506 ± 6% stress-ng.filename.ops
26222 ± 16% +28.6% 33722 +35.6% 35567 ± 6% stress-ng.filename.ops_per_sec
23431 ± 2% +0.5% 23559 +22.5% 28711 ± 4% stress-ng.fork.ops
23433 ± 2% +0.5% 23558 +22.5% 28709 ± 4% stress-ng.fork.ops_per_sec
48074 ± 15% -0.6% 47786 ± 9% -42.5% 27623 stress-ng.get.ops
48088 ± 15% -0.6% 47793 ± 9% -42.6% 27624 stress-ng.get.ops_per_sec
111379 -1.4% 109815 ± 2% -7.8% 102662 ± 4% stress-ng.getdent.ops
111398 -1.4% 109818 ± 2% -7.9% 102651 ± 4% stress-ng.getdent.ops_per_sec
226.67 ± 2% -1.0% 224.33 -15.1% 192.33 ± 6% stress-ng.inotify.ops
226.39 ± 2% -1.3% 223.53 -15.3% 191.82 ± 6% stress-ng.inotify.ops_per_sec
18.00 +9.3% 19.67 ± 2% +3.7% 18.67 ± 2% stress-ng.iomix.ops
4846 ± 10% +2.9% 4984 ± 7% -36.8% 3062 ± 7% stress-ng.kill.ops
4846 ± 10% +2.8% 4982 ± 7% -36.9% 3058 ± 7% stress-ng.kill.ops_per_sec
851324 ± 3% -9.9% 766988 ± 2% +2.3% 870703 stress-ng.lease.ops
850238 ± 3% -9.9% 766255 ± 2% +2.3% 870047 stress-ng.lease.ops_per_sec
134492 ± 14% -7.0% 125085 ± 16% -19.9% 107736 ± 3% stress-ng.mlock.ops_per_sec
1.38 +2.9% 1.42 ± 5% -20.2% 1.10 ± 12% stress-ng.mmapfork.ops_per_sec
1117721 ± 5% -9.2% 1015321 ± 20% -36.2% 713598 stress-ng.mq.ops
1117186 ± 5% -9.2% 1014593 ± 20% -36.2% 712387 stress-ng.mq.ops_per_sec
4584592 ± 2% -2.7% 4460470 ± 7% -12.0% 4036627 ± 8% stress-ng.msg.ops_per_sec
8179 ± 10% +7.3% 8772 ± 9% +24.0% 10144 stress-ng.opcode.ops
8177 ± 10% +7.2% 8770 ± 9% +24.0% 10141 stress-ng.opcode.ops_per_sec
5218 ± 3% +3.6% 5408 ± 2% +27.3% 6642 stress-ng.reboot.ops
5217 ± 3% +3.6% 5407 ± 2% +27.3% 6640 stress-ng.reboot.ops_per_sec
1724 ± 21% +27.9% 2205 ± 18% +54.3% 2660 ± 19% stress-ng.resources.ops
1217728 -31.4% 835808 ± 14% -18.7% 989423 ± 10% stress-ng.revio.ops
1217899 -31.4% 835846 ± 14% -18.8% 989313 ± 10% stress-ng.revio.ops_per_sec
752664 ± 9% -7.6% 695116 ± 10% -0.6% 748117 ± 6% stress-ng.seal.ops
752957 ± 9% -7.7% 695278 ± 10% -0.6% 748068 ± 6% stress-ng.seal.ops_per_sec
628.00 ± 83% -20.1% 502.00 ± 8% +176.2% 1734 stress-ng.seccomp.ops
628.22 ± 83% -20.1% 502.08 ± 8% +176.1% 1734 stress-ng.seccomp.ops_per_sec
1361741 ± 7% +4.0% 1416628 ± 8% +17.2% 1596549 ± 9% stress-ng.seek.ops
1361799 ± 7% +4.0% 1416578 ± 8% +17.2% 1596376 ± 9% stress-ng.seek.ops_per_sec
1537496 +1.2% 1556468 -9.9% 1385679 stress-ng.sigrt.ops
1532297 +1.3% 1551993 -9.9% 1381175 stress-ng.sigrt.ops_per_sec
1324589 ± 12% -26.6% 972626 ± 9% -14.8% 1128923 ± 4% stress-ng.sigsuspend.ops
1323957 ± 12% -26.6% 972084 ± 9% -14.8% 1128174 ± 4% stress-ng.sigsuspend.ops_per_sec
806.67 ± 13% +11.2% 896.67 ± 10% +31.3% 1059 ± 4% stress-ng.sockabuse.ops
806.73 ± 13% +11.1% 896.44 ± 10% +31.3% 1059 ± 4% stress-ng.sockabuse.ops_per_sec
6224635 ± 2% +0.8% 6274521 ± 2% -12.8% 5427883 ± 2% stress-ng.sockpair.ops
5383397 -1.9% 5279526 ± 2% -14.9% 4579084 ± 4% stress-ng.sockpair.ops_per_sec
22409 ± 6% -2.8% 21784 ± 5% -41.2% 13172 ± 9% stress-ng.sysbadaddr.ops
22414 ± 6% -2.8% 21784 ± 5% -41.3% 13167 ± 9% stress-ng.sysbadaddr.ops_per_sec
525.33 ± 13% +2.6% 539.00 ± 15% -42.1% 304.33 ± 25% stress-ng.sysfs.ops
524.98 ± 13% +2.6% 538.58 ± 15% -42.1% 303.79 ± 25% stress-ng.sysfs.ops_per_sec
3077 ± 12% +19.9% 3689 ± 4% +23.0% 3786 ± 4% stress-ng.sysinfo.ops
3009 ± 11% +22.6% 3688 ± 4% +25.7% 3783 ± 4% stress-ng.sysinfo.ops_per_sec
3001932 +1.2% 3038017 +15.7% 3474274 stress-ng.udp.ops
3002618 +1.2% 3037939 +15.7% 3473606 stress-ng.udp.ops_per_sec
184634 ± 12% -6.9% 171823 +24.5% 229788 ± 4% stress-ng.utime.ops
184678 ± 12% -7.0% 171836 +24.4% 229784 ± 4% stress-ng.utime.ops_per_sec
1835 ± 5% +2.2% 1875 ± 4% +8.2% 1986 ± 3% stress-ng.vforkmany.ops
1829 ± 5% +1.9% 1863 ± 4% +8.1% 1978 ± 3% stress-ng.vforkmany.ops_per_sec
14829 -1.0% 14682 ± 2% -20.1% 11848 ± 3% stress-ng.vm-segv.ops
14830 -1.0% 14682 ± 2% -29.7% 10433 ± 13% stress-ng.vm-segv.ops_per_sec
3527835 ± 3% +1.9% 3596613 ± 5% +495.6% 21010377 stress-ng.vm-splice.ops
3528845 ± 3% +1.9% 3597219 ± 5% +495.4% 21011045 stress-ng.vm-splice.ops_per_sec