Re: [BUG?] bcachefs performance: read is way too slow when a file has no overwrite.
From: David Wang
Date: Sat Sep 07 2024 - 06:35:28 EST
At 2024-09-07 01:38:11, "Kent Overstreet" <kent.overstreet@xxxxxxxxx> wrote:
>On Fri, Sep 06, 2024 at 11:43:54PM GMT, David Wang wrote:
>>
>> Hi,
>>
>> I notice a very strange performance issue:
>> When run `fio direct randread` test on a fresh new bcachefs, the performance is very bad:
>> fio --randrepeat=1 --ioengine=libaio --direct=1 --name=test --bs=4k --iodepth=64 --size=1G --readwrite=randread --runtime=600 --numjobs=8 --time_based=1
>> ...
>> Run status group 0 (all jobs):
>> READ: bw=87.0MiB/s (91.2MB/s), 239B/s-14.2MiB/s (239B/s-14.9MB/s), io=1485MiB (1557MB), run=15593-17073msec
>>
>> But if the files already exist and have alreay been thoroughly overwritten, the read performance is about 850MB+/s,
>> almost 10-times better!
>>
>> This means, if I copy some file from somewhere else, and make read access only afterwards, I would get really bad performance.
>> (I copy files from other filesystem, and run fio read test on those files, the performance is indeed bad.)
>> Copy some prepared files, and make readonly usage afterwards, this usage scenario is quite normal for lots of apps, I think.
>
>That's because checksums are at extent granularity, not block: if you're
>doing O_DIRECT reads that are smaller than the writes the data was
>written with, performance will be bad because we have to read the entire
>extent to verify the checksum.
>
>block granular checksums will come at some point, as an optional feature
>(most of the time you don't want them, and you'd prefer more compact
>metadata)
Hi, I made further tests combining different write and read size, the results
are not confirming the explanation for O_DIRECT.
Without O_DIRECT (fio --direct=0....), the average read bandwidth
is improved, but with a very big standard deviation:
+--------------------+----------+----------+----------+----------+
| prepare-write\read | 1k | 4k | 8K | 16K |
+--------------------+----------+----------+----------+----------+
| 1K | 328MiB/s | 395MiB/s | 465MiB/s | |
| 4K | 193MiB/s | 219MiB/s | 274MiB/s | 392MiB/s |
| 8K | 251MiB/s | 280MiB/s | 368MiB/s | 435MiB/s |
| 16K | 302MiB/s | 380MiB/s | 464MiB/s | 577MiB/s |
+--------------------+----------+----------+----------+----------+
(Rows are write size when preparing the test files, and columns are read size for fio test.)
And with O_DIRECT, the result is:
+--------------------+-----------+-----------+----------+----------+
| prepare-write\read | 1k | 4k | 8K | 16K |
+--------------------+-----------+-----------+----------+----------+
| 1K | 24.1MiB/s | 96.5MiB/s | 193MiB/s | |
| 4K | 14.4MiB/s | 57.6MiB/s | 116MiB/s | 230MiB/s |
| 8K | 24.6MiB/s | 97.6MiB/s | 192MiB/s | 309MiB/s |
| 16K | 26.4MiB/s | 104MiB/s | 206MiB/s | 402MiB/s |
+--------------------+-----------+-----------+----------+----------+
code to prepare the test files:
#define KN 8 //<- adjust this for each row
char name[32];
char buf[1024*KN];
int main() {
int i, m = 1024*1024/KN, k, df;
for (i=0; i<8; i++) {
sprintf(name, "test.%d.0", i);
fd = open(name, O_CREAT|O_DIRECT|O_SYNC|O_TRUNC|O_WRONLY);
for (k=0; k<m; k++) write(fd, buf, sizeof(buf));
close(fd);
}
return 0;
}
Based on the result:
1. The row with prepare-write size 4K stands out, here.
When files were prepaired with write size 4K, the afterwards
read performance is worse. (I did double check the result,
but it is possible that I miss some affecting factors.);
2. Without O_DIRECT, read performance seems correlated with the difference
between read size and prepare write size, but with O_DIRECT, correlation is not obvious.
And, to mention it again, if I overwrite the files **thoroughly** with fio write test
(using same size), the read performance afterwards would be very good:
# overwrite the files with randwrite, block size 8k
$ fio --randrepeat=1 --ioengine=libaio --direct=1 --name=test --bs=8k --iodepth=64 --size=1G --readwrite=randwrite --runtime=300 --numjobs=8 --time_based=1
# test the read performance with randread, block size 8k
$ fio --randrepeat=1 --ioengine=libaio --direct=1 --name=test --bs=8k --iodepth=64 --size=1G --readwrite=randread --runtime=300 --numjobs=8 --time_based=1
...
Run status group 0 (all jobs):
READ: bw=964MiB/s (1011MB/s), 116MiB/s-123MiB/s (121MB/s-129MB/s), io=283GiB (303GB), run=300004-300005msec
FYI
David