Re: [BUG?] bcachefs performance: read is way too slow when a file has no overwrite.

From: David Wang
Date: Wed Sep 11 2024 - 22:40:20 EST



Hi,
At 2024-09-09 21:37:35, "Kent Overstreet" <kent.overstreet@xxxxxxxxx> wrote:
>On Sat, Sep 07, 2024 at 06:34:37PM GMT, David Wang wrote:

>>
>> Based on the result:
>> 1. The row with prepare-write size 4K stands out, here.
>> When files were prepaired with write size 4K, the afterwards
>> read performance is worse. (I did double check the result,
>> but it is possible that I miss some affecting factors.);
>
>On small blocksize tests you should be looking at IOPS, not MB/s.
>
>Prepare-write size is the column?
Each row is for a specific prepare-write size indicated by first column.

>
>Another factor is that we do merge extents (including checksums); so if
>the preparet-write is done sequentially we won't actually be ending up
>with extents of the same size as what we wrote.
>
>I believe there's a knob somewhere to turn off extent merging (module
>parameter? it's intended for debugging).

I made some debug, when performance is bad, the conditions
bvec_iter_sectors(iter) != pick.crc.uncompressed_size and
bvec_iter_sectors(iter) != pick.crc.live_size are "almost" always both "true",
while when performance is good (after "thorough" write), they are only little
percent (~350 out of 1000000) to be true.

And if those conditions are "true", "bounce" would be set and code seems to run
on a time consuming path.

I suspect "merely read" could never change those conditions, but "write" can?

>
>> 2. Without O_DIRECT, read performance seems correlated with the difference
>> between read size and prepare write size, but with O_DIRECT, correlation is not obvious.
>
>So the O_DIRECT and buffered IO paths are very different (in every
>filesystem) - you're looking at very different things. They are both
>subject to the checksum granularity issue, but in buffered mode we round
>up reads to extent size, when filling into the page cache.
>
>Big standard deviation (high tail latency?) is something we'd want to
>track down. There's a bunch of time_stats in sysfs, but they're mostly
>for the write paths. If you're trying to identify where the latencies
>are coming from, we can look at adding some new time stats to isolate.