Re: [PATCH] fs: Fix data race in btrfs_drop_extents

From: haoran zheng
Date: Sun Dec 01 2024 - 22:36:44 EST


I read the relevant code again and found that modify_tree is also used for
judgment at line 331, but the judgment here may lead to the release of the
path. Will this cause other problems such as unexpected path release?
331: if (recow || !modify_tree) {
332: modify_tree = -1;
333: btrfs_release_path(path);
334: continue;
335: }

On Mon, Dec 2, 2024 at 11:13 AM haoran zheng <zhenghaoran154@xxxxxxxxx> wrote:
>
> Thanks for the explanation. I will fix the description and resubmit the patch.
>
> On Mon, Dec 2, 2024 at 1:39 AM Filipe Manana <fdmanana@xxxxxxxxxx> wrote:
>>
>> On Sun, Dec 1, 2024 at 11:26 AM Hao-ran Zheng <zhenghaoran154@xxxxxxxxx> wrote:
>> >
>> > A data race occurs when the function `insert_ordered_extent_file_extent()`
>> > and the function `btrfs_inode_safe_disk_i_size_write()` are executed
>> > concurrently. The function `insert_ordered_extent_file_extent()` is not
>> > locked when reading inode->disk_i_size, causing
>> > `btrfs_inode_safe_disk_i_size_write()`to cause data competition when
>> > writing inode->disk_i_size, thus affecting the value of `modify_tree`,
>> > leading to some unexpected results such as disk data being overwritten.
>>
>> How can that cause "disk data being overwritten"?
>> And the results are not unexpected at all.
>>
>> The value of modify_tree is irrelevant from a correctness point of view.
>> It's used for an optimization to avoid taking write locks on the btree
>> in case we're doing a write at or beyond eof.
>>
>> If we end up taking a write lock when it's not needed, everything's
>> fine - we just may unnecessarily block concurrent readers that need to
>> access the same btree path (leaf and parent node).
>>
>> If we don't take a write lock and we need it, we will later figure
>> that out and switch to a write lock.
>>
>> > The specific call stack that appears during testing is as follows:
>> >
>> > ============DATA_RACE============
>> > btrfs_drop_extents+0x89a/0xa060 [btrfs]
>> > insert_reserved_file_extent+0xb54/0x2960 [btrfs]
>> > insert_ordered_extent_file_extent+0xff5/0x1760 [btrfs]
>> > btrfs_finish_one_ordered+0x1b85/0x36a0 [btrfs]
>> > btrfs_finish_ordered_io+0x37/0x60 [btrfs]
>> > finish_ordered_fn+0x3e/0x50 [btrfs]
>> > btrfs_work_helper+0x9c9/0x27a0 [btrfs]
>> > process_scheduled_works+0x716/0xf10
>> > worker_thread+0xb6a/0x1190
>> > kthread+0x292/0x330
>> > ret_from_fork+0x4d/0x80
>> > ret_from_fork_asm+0x1a/0x30
>> > ============OTHER_INFO============
>> > btrfs_inode_safe_disk_i_size_write+0x4ec/0x600 [btrfs]
>> > btrfs_finish_one_ordered+0x24c7/0x36a0 [btrfs]
>> > btrfs_finish_ordered_io+0x37/0x60 [btrfs]
>> > finish_ordered_fn+0x3e/0x50 [btrfs]
>> > btrfs_work_helper+0x9c9/0x27a0 [btrfs]
>> > process_scheduled_works+0x716/0xf10
>> > worker_thread+0xb6a/0x1190
>> > kthread+0x292/0x330
>> > ret_from_fork+0x4d/0x80
>> > ret_from_fork_asm+0x1a/0x30
>> > =================================
>> >
>> > To address this issue, it is recommended to add locks when reading
>> > inode->disk_i_size and setting the value of modify_tree to prevent
>> > data inconsistency.
>>
>> Can also use data_race() here, as it's a harmless race.
>>
>> Also, please use a proper subject like for example:
>>
>> btrfs: fix data race when accessing the inode's disk_i_size at
>> btrfs_drop_extents()
>>
>> Also please update the changelog with a proper analysis - saying it's
>> a harmless race and why.
>>
>> Thanks.
>>
>> >
>> > Signed-off-by: Hao-ran Zheng <zhenghaoran154@xxxxxxxxx>
>> > ---
>> > fs/btrfs/file.c | 2 ++
>> > 1 file changed, 2 insertions(+)
>> >
>> > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>> > index 4fb521d91b06..189708e6e91a 100644
>> > --- a/fs/btrfs/file.c
>> > +++ b/fs/btrfs/file.c
>> > @@ -242,8 +242,10 @@ int btrfs_drop_extents(struct btrfs_trans_handle *trans,
>> > if (args->drop_cache)
>> > btrfs_drop_extent_map_range(inode, args->start, args->end - 1, false);
>> >
>> > + spin_lock(&inode->lock);
>> > if (args->start >= inode->disk_i_size && !args->replace_extent)
>> > modify_tree = 0;
>> > + spin_unlock(&inode->lock);
>> >
>> > update_refs = (btrfs_root_id(root) != BTRFS_TREE_LOG_OBJECTID);
>> > while (1) {
>> > --
>> > 2.34.1
>> >
>> >