Re: fiemap is slow on btrfs on files with multiple extents
From: Dominique MARTINET
Date: Wed Sep 21 2022 - 03:30:50 EST
Filipe Manana wrote on Thu, Sep 01, 2022 at 02:25:12PM +0100:
> It took me a bit more than I expected, but here is the patchset to make fiemap
> (and lseek) much more efficient on btrfs:
>
> https://lore.kernel.org/linux-btrfs/cover.1662022922.git.fdmanana@xxxxxxxx/
>
> And also available in this git branch:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git/log/?h=lseek_fiemap_scalability
Thanks a lot!
Sorry for the slow reply, it took me a while to find time to get back to
my test setup.
There's still this weird behaviour that later calls to cp are slower
than the first, but the improvement is so good that it doesn't matter
quite as much -- I haven't been able to reproduce the rcu stalls in qemu
so I can't say for sure but they probably won't be a problem anymore.
>From a quick look with perf record/report the difference still seems to
stem from fiemap (time spent there goes from 4.13 to 45.20%), so there
is still more processing once the file is (at least partially) in cache,
but it has gotten much better.
(tests run on a laptop so assume some inconsistency with thermal
throttling etc)
/mnt/t/t # compsize bigfile
Processed 1 file, 194955 regular extents (199583 refs), 0 inline.
Type Perc Disk Usage Uncompressed Referenced
TOTAL 15% 3.7G 23G 23G
none 100% 477M 477M 514M
zstd 14% 3.2G 23G 23G
/mnt/t/t # time cp bigfile /dev/null
real 0m 44.52s
user 0m 0.49s
sys 0m 32.91s
/mnt/t/t # time cp bigfile /dev/null
real 0m 46.81s
user 0m 0.55s
sys 0m 35.63s
/mnt/t/t # time cp bigfile /dev/null
real 1m 13.63s
user 0m 0.55s
sys 1m 1.89s
/mnt/t/t # time cp bigfile /dev/null
real 1m 13.44s
user 0m 0.53s
sys 1m 2.08s
For comparison here's how it was on 6.0-rc2 your branch is based on:
/mnt/t/t # time cp atde-test /dev/null
real 0m 46.17s
user 0m 0.60s
sys 0m 33.21s
/mnt/t/t # time cp atde-test /dev/null
real 5m 35.92s
user 0m 0.57s
sys 5m 24.20s
If you're curious the report blames set_extent_bit and
clear_state_bit as follow; get_extent_skip_holes is completely gone; but
I wouldn't necessarily say this needs much more time spent on it.
45.20%--extent_fiemap
|
|--31.02%--lock_extent_bits
| |
| --30.78%--set_extent_bit
| |
| |--6.93%--insert_state
| | |
| | --0.70%--set_state_bits
| |
| |--4.25%--alloc_extent_state
| | |
| | --3.86%--kmem_cache_alloc
| |
| |--2.77%--_raw_spin_lock
| | |
| | --1.23%--preempt_count_add
| |
| |--2.48%--rb_next
| |
| |--1.13%--_raw_spin_unlock
| | |
| | --0.55%--preempt_count_sub
| |
| --0.92%--set_state_bits
|
--13.80%--__clear_extent_bit
|
--13.30%--clear_state_bit
|
| --3.48%--_raw_spin_unlock_irqrestore
|
|--2.45%--merge_state.part.0
| |
| --1.57%--rb_next
|
|--2.14%--__slab_free
| |
| --1.26%--cmpxchg_double_slab.constprop.0.isra.0
|
|--0.74%--free_extent_state
|
|--0.70%--kmem_cache_free
|
|--0.69%--btrfs_clear_delalloc_extent
|
--0.52%--rb_next
Thanks!
--
Dominique