Re: [PATCH] btrfs: wait for ordered extents before buffered write fallback in direct IO
From: Zhou, Yun
Date: Thu Jun 25 2026 - 02:47:36 EST
On 6/25/26 13:17, Qu Wenruo wrote:
在 2026/6/25 14:43, Qu Wenruo 写道:
在 2026/6/25 11:44, Yun Zhou 写道:
When btrfs_direct_write() falls back to buffered IO after a failed DIO
attempt, it may race with the asynchronous completion of DIO ordered
extents. This leads to a BUG_ON in insert_ordered_extent() due to
overlapping ordered extents in the per-inode rb-tree.
The race sequence is:
1. DIO creates an ordered extent via btrfs_dio_iomap_begin()
2. Page fault occurs (nofault=true), no bio is submitted (submitted=0)
3. btrfs_dio_iomap_end() truncates and finishes the OE asynchronously
via btrfs_finish_ordered_extent() which queues work
4. iomap returns 0, retry logic faults in pages and retries DIO
5. Second DIO attempt also fails, code reaches buffered: label
6. btrfs_buffered_write() dirties pages for the same range
btrfs_buffered_write()
|- copy_one_range()
|- lock_and_cleanup_extent_if_needed()
|- btrfs_start_ordered_extent()
So your explanation doesn't makes sense. As if there is the direct IO oe
remaining, we will wait for that OE to complete.
There is still something missing.
7. btrfs_fdatawrite_range() triggers writeback
8. run_delalloc_nocow() -> fallback_to_cow() -> cow_file_range()
tries to insert a new ordered extent for the same file offset
9. The DIO ordered extent hasn't been removed from the rb-tree yet
(btrfs_finish_ordered_io running async in workqueue) -> BUG_ON
Fix this by waiting for any pending ordered extents in the target range
before starting the buffered write.
Reported-by: syzbot+ba2afde329fc27e3f22e@xxxxxxxxxxxxxxxxxxxxxxxxx
Closes: https://syzkaller.appspot.com/bug?extid=ba2afde329fc27e3f22e
Fixes: acf9ed3a6c00 ("btrfs: retry faulting in the pages after a zero
sized short direct write")
And the fixes tag is also incorrect.
Without that commit, we will directly fallback to buffered write without
retry faulting in the pages.
So by your explanation it will trigger the same problem, with or without
that commit.
Yes, my previous analysis does seem inaccurate. Commit acf9ed3a6c00 (which
added retries) merely amplified the window for the issue to occur, but the problem has actually existed since ff66fe666233 ('btrfs: fix incorrect buffered IO fallback for append direct writes'), which introduced i_size revert on DIO short writes, causing lock_and_cleanup_extent_if_need() to skip the OE check (since start_pos >= reverted i_size). I will correct the commit message and the Fixes tag in v2.
Thanks,
Yun