Re: [PATCH AUTOSEL 5.11 03/12] btrfs: subpage: fix the false data csum mismatch error

From: David Sterba
Date: Mon Mar 08 2021 - 10:46:29 EST


On Sun, Mar 07, 2021 at 08:57:37AM -0500, Sasha Levin wrote:
> From: Qu Wenruo <wqu@xxxxxxxx>
>
> [ Upstream commit c28ea613fafad910d08f67efe76ae552b1434e44 ]
>
> [BUG]
> When running fstresss, we can hit strange data csum mismatch where the
> on-disk data is in fact correct (passes scrub).
>
> With some extra debug info added, we have the following traces:
>
> 0482us: btrfs_do_readpage: root=5 ino=284 offset=393216, submit force=0 pgoff=0 iosize=8192
> 0494us: btrfs_do_readpage: root=5 ino=284 offset=401408, submit force=0 pgoff=8192 iosize=4096
> 0498us: btrfs_submit_data_bio: root=5 ino=284 bio first bvec=393216 len=8192
> 0591us: btrfs_do_readpage: root=5 ino=284 offset=405504, submit force=0 pgoff=12288 iosize=36864
> 0594us: btrfs_submit_data_bio: root=5 ino=284 bio first bvec=401408 len=4096
> 0863us: btrfs_submit_data_bio: root=5 ino=284 bio first bvec=405504 len=36864
> 0933us: btrfs_verify_data_csum: root=5 ino=284 offset=393216 len=8192
> 0967us: btrfs_do_readpage: root=5 ino=284 offset=442368, skip beyond isize pgoff=49152 iosize=16384
> 1047us: btrfs_verify_data_csum: root=5 ino=284 offset=401408 len=4096
> 1163us: btrfs_verify_data_csum: root=5 ino=284 offset=405504 len=36864
> 1290us: check_data_csum: !!! root=5 ino=284 offset=438272 pg_off=45056 !!!
> 7387us: end_bio_extent_readpage: root=5 ino=284 before pending_read_bios=0
>
> [CAUSE]
> Normally we expect all submitted bio reads to only touch the range we
> specified, and under subpage context, it means we should only touch the
> range specified in each bvec.
>
> But in data read path, inside end_bio_extent_readpage(), we have page
> zeroing which only takes regular page size into consideration.
>
> This means for subpage if we have an inode whose content looks like below:
>
> 0 16K 32K 48K 64K
> |///////| |///////| |
>
> |//| = data needs to be read from disk
> | | = hole
>
> And i_size is 64K initially.
>
> Then the following race can happen:
>
> T1 | T2
> --------------------------------+--------------------------------
> btrfs_do_readpage() |
> |- isize = 64K; |
> | At this time, the isize is |
> | 64K |
> | |
> |- submit_extent_page() |
> | submit previous assembled bio|
> | assemble bio for [0, 16K) |
> | |
> |- submit_extent_page() |
> submit read bio for [0, 16K) |
> assemble read bio for |
> [32K, 48K) |
> |
> | btrfs_setsize()
> | |- i_size_write(, 16K);
> | Now i_size is only 16K
> end_io() for [0K, 16K) |
> |- end_bio_extent_readpage() |
> |- btrfs_verify_data_csum() |
> | No csum error |
> |- i_size = 16K; |
> |- zero_user_segment(16K, |
> PAGE_SIZE); |
> !!! We zeroed range |
> !!! [32K, 48K) |
> | end_io for [32K, 48K)
> | |- end_bio_extent_readpage()
> | |- btrfs_verify_data_csum()
> | ! CSUM MISMATCH !
> | ! As the range is zeroed now !
>
> [FIX]
> To fix the problem, make end_bio_extent_readpage() to only zero the
> range of bvec.
>
> The bug only affects subpage read-write support, as for full read-only
> mount we can't change i_size thus won't hit the race condition.

Please drop this patch from autosel because of the above, this is in a
feature that's in progress and does not affect regular filesystems.