Re: [LKP] Re: [ext4] b1b4705d54: filebench.sum_bytes_mb/s -20.2% regression

From: Xing Zhengjun
Date: Wed Apr 15 2020 - 03:55:35 EST

Next message: Tian, Kevin: "RE: [PATCH v2 0/7] iommu/vt-d: Add page request draining support"
Previous message: Geert Uytterhoeven: "Re: [RFC][PATCH v2 1/2] driver core: Revert default driver_deferred_probe_timeout value to 0"
Next in thread: Jan Kara: "Re: [LKP] Re: [ext4] b1b4705d54: filebench.sum_bytes_mb/s -20.2% regression"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 3/25/2020 10:31 PM, Jan Kara wrote:

On Wed 25-03-20 13:50:09, Xing Zhengjun wrote:

ping...
The issue still exists in v5.6-rc7.

So I have tried again to reproduce this so that I can look into the
regression. When observing what is actually happening in the system I have
to say that this workfile (or actually its implementation in filebench) is
pretty dubious. The problem is that filebench first creates the files by
writing them through ordinary write(2). Then it immediately starts reading
the files with direct IO read. So what happens is that by the time direct
IO read is running, the system is still writing back the create files and
depending on how read vs writes get scheduled, you get different results.
Also direct IO read will first flush the range it is going to read from the
page cache so to some extent this is actually parallel small ranged
fsync(2) benchmark. Finally differences in how we achieve integrity of
direct IO reads with dirty page cache are going to impact this benchmark.

Sounds reasonable! Thanks for the clarification!

So overall can now see why this commit makes a difference but the workload
is IMHO largely irrelevant. What would make sense is to run filebench once,
then unmount & mount the fs to force files to disk and clear page cache and
then run it again. Filebench will reuse the files in this case and then
parallel direct IO readers without page cache are a sensible workload. But
I didn't see any difference in that (even with rotating disk) on my
machines.

We do a test per your suggestion, run "filebench" once during setup stage, then do a "sync", after that run "filebench" again, from the attached test result "compare", "filebench.sum_bytes_mb/s" regression is disappeared.

Honza

On 3/4/2020 4:15 PM, Xing Zhengjun wrote:

Hi Matthew,

ÂWe test it in v5.6-rc4, the issue still exist, do you have time to
take a look at this? Thanks.

On 1/8/2020 10:31 AM, Rong Chen wrote:

On 1/8/20 1:28 AM, Jan Kara wrote:

On Tue 07-01-20 11:57:08, Theodore Y. Ts'o wrote:

On Tue, Jan 07, 2020 at 02:41:06PM +0100, Jan Kara wrote:

Hello,

On Tue 24-12-19 08:59:15, kernel test robot wrote:

FYI, we noticed a -20.2% regression of
filebench.sum_bytes_mb/s due to commit:

commit: b1b4705d54abedfd69dcdf42779c521aa1e0fbd3
("ext4: introduce direct I/O read using iomap
infrastructure")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git
master

in testcase: filebench
on test machine: 8 threads Intel(R) Core(TM) i7-4770
CPU @ 3.40GHz with 8G memory
with following parameters:

ÂÂÂÂdisk: 1HDD
ÂÂÂÂfs: ext4
ÂÂÂÂtest: fivestreamreaddirect.f
ÂÂÂÂcpufreq_governor: performance
ÂÂÂÂucode: 0x27

I was trying to reproduce this but I failed with my test
VM. I had SATA SSD
as a backing store though so maybe that's what makes a
difference. Maybe
the new code results in somewhat more seeks because the
five threads which
compete in submitting sequential IO end up being more interleaved?

A "-20.2% regression" should be read as a "20.2% performance
improvement" is zero-day kernel speak.

Are you sure? I can see:

ÂÂÂÂÂ 58.30 ÂÂ 2%ÂÂÂÂ -20.2%ÂÂÂÂÂ 46.53ÂÂÂÂÂÂÂ filebench.sum_bytes_mb/s

which implies to me previously the throughput was 58 MB/s and after the
commit it was 46 MB/s?

Anyway, in my testing that commit made no difference in that benchmark
whasoever (getting around 97 MB/s for each thread before and after the
commit).
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Honza

We're sorry for the misunderstanding, "-20.2%" means the change of
filebench.sum_bytes_mb/s,
"regression" means the explanation of this change from LKP.

Best Regards,
Rong Chen
_______________________________________________
LKP mailing list -- lkp@xxxxxxxxxxxx
To unsubscribe send an email to lkp-leave@xxxxxxxxxxxx

--
Zhengjun Xing

--
Zhengjun Xing
=========================================================================================
tbox_group/testcase/rootfs/kconfig/compiler/debug-setup/disk/fs/test/cpufreq_governor/ucode:
lkp-hsw-d01/filebench/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/test2/1HDD/ext4/fivestreamreaddirect.f/performance/0x27

commit:
b1b4705d54abedfd69dcdf42779c521aa1e0fbd3
09edf4d381957b144440bac18a4769c53063b943
v5.5
v5.7-rc1

b1b4705d54abedfd 09edf4d381957b144440bac18a4 v5.5 v5.7-rc1
---------------- --------------------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev %change %stddev
\ | \ | \ | \
59.40 +0.0% 59.40 -0.8% 58.93 -1.0% 58.80 filebench.sum_bytes_mb/s
3570 +0.0% 3570 -0.8% 3541 -1.0% 3533 filebench.sum_operations
59.50 +0.0% 59.50 -0.8% 59.02 -1.0% 58.89 filebench.sum_operations/s
59.33 +0.0% 59.33 +0.0% 59.33 -0.6% 59.00 filebench.sum_reads/s
83.98 -1.5% 82.75 +0.8% 84.62 +1.0% 84.84 filebench.sum_time_ms/op

Next message: Tian, Kevin: "RE: [PATCH v2 0/7] iommu/vt-d: Add page request draining support"
Previous message: Geert Uytterhoeven: "Re: [RFC][PATCH v2 1/2] driver core: Revert default driver_deferred_probe_timeout value to 0"
Next in thread: Jan Kara: "Re: [LKP] Re: [ext4] b1b4705d54: filebench.sum_bytes_mb/s -20.2% regression"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]