Re: [PATCH] fs: Fix data race in btrfs_drop_extents

From: Filipe Manana
Date: Mon Dec 02 2024 - 06:37:37 EST


On Mon, Dec 2, 2024 at 3:36 AM haoran zheng <zhenghaoran154@xxxxxxxxx> wrote:
>
> I read the relevant code again and found that modify_tree is also used for
> judgment at line 331, but the judgment here may lead to the release of the
> path. Will this cause other problems such as unexpected path release?
> 331: if (recow || !modify_tree) {
> 332: modify_tree = -1;
> 333: btrfs_release_path(path);
> 334: continue;
> 335: }

This is what I told you before - this is triggered when we have taken
a read lock (modify_tree == 0) but it turns out we need a write lock
(in order to drop extent items from the inode's root).
In that case we release the path and reacquire it again with write
locks, that's the meaning of setting modify_tree to -1 and doing the
continue.

Btw, before you do extraordinary claims in a change log such as
"unexpected results" and "disk data being overwritten", make sure you
actually have a good understanding, not just of this
function (which you clearly don't), as well as btrfs' metadata, the
btree data structure and its operations and the write path in general
(which you clearly don't have as well).

Extraordinary claims demand a good explanation.

This time you found a harmless race (with KCSAN), but not long ago
you've sent out another patch that was even more puzzling:

https://lore.kernel.org/linux-btrfs/20241101035133.925251-1-zhenghaoran@xxxxxxxxxxx/

You claimed a race on code that is run sequentially for a data
structure that is not shared amongst threads, so there's no way on
earth a race condition could happen there.
This lack of explanation demonstrated a total lack of understanding of
the btrfs logging code, and even worse the patch didn't even compile -
you didn't even test compilation, let alone exercise the code.

That sort of behaviour made me think you were a bot or one of those
trolls that often show up (like the infamous Nick Krause some years
ago).
Avoid that sort of behaviour please, otherwise no one will take you seriously.

Thanks.

>
> On Mon, Dec 2, 2024 at 11:13 AM haoran zheng <zhenghaoran154@xxxxxxxxx> wrote:
> >
> > Thanks for the explanation. I will fix the description and resubmit the patch.
> >
> > On Mon, Dec 2, 2024 at 1:39 AM Filipe Manana <fdmanana@xxxxxxxxxx> wrote:
> >>
> >> On Sun, Dec 1, 2024 at 11:26 AM Hao-ran Zheng <zhenghaoran154@xxxxxxxxx> wrote:
> >> >
> >> > A data race occurs when the function `insert_ordered_extent_file_extent()`
> >> > and the function `btrfs_inode_safe_disk_i_size_write()` are executed
> >> > concurrently. The function `insert_ordered_extent_file_extent()` is not
> >> > locked when reading inode->disk_i_size, causing
> >> > `btrfs_inode_safe_disk_i_size_write()`to cause data competition when
> >> > writing inode->disk_i_size, thus affecting the value of `modify_tree`,
> >> > leading to some unexpected results such as disk data being overwritten.
> >>
> >> How can that cause "disk data being overwritten"?
> >> And the results are not unexpected at all.
> >>
> >> The value of modify_tree is irrelevant from a correctness point of view.
> >> It's used for an optimization to avoid taking write locks on the btree
> >> in case we're doing a write at or beyond eof.
> >>
> >> If we end up taking a write lock when it's not needed, everything's
> >> fine - we just may unnecessarily block concurrent readers that need to
> >> access the same btree path (leaf and parent node).
> >>
> >> If we don't take a write lock and we need it, we will later figure
> >> that out and switch to a write lock.
> >>
> >> > The specific call stack that appears during testing is as follows:
> >> >
> >> > ============DATA_RACE============
> >> > btrfs_drop_extents+0x89a/0xa060 [btrfs]
> >> > insert_reserved_file_extent+0xb54/0x2960 [btrfs]
> >> > insert_ordered_extent_file_extent+0xff5/0x1760 [btrfs]
> >> > btrfs_finish_one_ordered+0x1b85/0x36a0 [btrfs]
> >> > btrfs_finish_ordered_io+0x37/0x60 [btrfs]
> >> > finish_ordered_fn+0x3e/0x50 [btrfs]
> >> > btrfs_work_helper+0x9c9/0x27a0 [btrfs]
> >> > process_scheduled_works+0x716/0xf10
> >> > worker_thread+0xb6a/0x1190
> >> > kthread+0x292/0x330
> >> > ret_from_fork+0x4d/0x80
> >> > ret_from_fork_asm+0x1a/0x30
> >> > ============OTHER_INFO============
> >> > btrfs_inode_safe_disk_i_size_write+0x4ec/0x600 [btrfs]
> >> > btrfs_finish_one_ordered+0x24c7/0x36a0 [btrfs]
> >> > btrfs_finish_ordered_io+0x37/0x60 [btrfs]
> >> > finish_ordered_fn+0x3e/0x50 [btrfs]
> >> > btrfs_work_helper+0x9c9/0x27a0 [btrfs]
> >> > process_scheduled_works+0x716/0xf10
> >> > worker_thread+0xb6a/0x1190
> >> > kthread+0x292/0x330
> >> > ret_from_fork+0x4d/0x80
> >> > ret_from_fork_asm+0x1a/0x30
> >> > =================================
> >> >
> >> > To address this issue, it is recommended to add locks when reading
> >> > inode->disk_i_size and setting the value of modify_tree to prevent
> >> > data inconsistency.
> >>
> >> Can also use data_race() here, as it's a harmless race.
> >>
> >> Also, please use a proper subject like for example:
> >>
> >> btrfs: fix data race when accessing the inode's disk_i_size at
> >> btrfs_drop_extents()
> >>
> >> Also please update the changelog with a proper analysis - saying it's
> >> a harmless race and why.
> >>
> >> Thanks.
> >>
> >> >
> >> > Signed-off-by: Hao-ran Zheng <zhenghaoran154@xxxxxxxxx>
> >> > ---
> >> > fs/btrfs/file.c | 2 ++
> >> > 1 file changed, 2 insertions(+)
> >> >
> >> > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> >> > index 4fb521d91b06..189708e6e91a 100644
> >> > --- a/fs/btrfs/file.c
> >> > +++ b/fs/btrfs/file.c
> >> > @@ -242,8 +242,10 @@ int btrfs_drop_extents(struct btrfs_trans_handle *trans,
> >> > if (args->drop_cache)
> >> > btrfs_drop_extent_map_range(inode, args->start, args->end - 1, false);
> >> >
> >> > + spin_lock(&inode->lock);
> >> > if (args->start >= inode->disk_i_size && !args->replace_extent)
> >> > modify_tree = 0;
> >> > + spin_unlock(&inode->lock);
> >> >
> >> > update_refs = (btrfs_root_id(root) != BTRFS_TREE_LOG_OBJECTID);
> >> > while (1) {
> >> > --
> >> > 2.34.1
> >> >
> >> >