On Wed 28-04-21 16:51:58, Ye Bin wrote:Assume that there is extent [10, 100] (ee_block=10 ee_len=91), call ext4_split_extent_at split at 50,
We got follow bug_on when run fsstress with injecting IO fault:Thanks for the patch but I'm still not quite sure, how overlapping extents
[130747.323114] kernel BUG at fs/ext4/extents_status.c:762!
[130747.323117] Internal error: Oops - BUG: 0 [#1] SMP
......
[130747.334329] Call trace:
[130747.334553] ext4_es_cache_extent+0x150/0x168 [ext4]
[130747.334975] ext4_cache_extents+0x64/0xe8 [ext4]
[130747.335368] ext4_find_extent+0x300/0x330 [ext4]
[130747.335759] ext4_ext_map_blocks+0x74/0x1178 [ext4]
[130747.336179] ext4_map_blocks+0x2f4/0x5f0 [ext4]
[130747.336567] ext4_mpage_readpages+0x4a8/0x7a8 [ext4]
[130747.336995] ext4_readpage+0x54/0x100 [ext4]
[130747.337359] generic_file_buffered_read+0x410/0xae8
[130747.337767] generic_file_read_iter+0x114/0x190
[130747.338152] ext4_file_read_iter+0x5c/0x140 [ext4]
[130747.338556] __vfs_read+0x11c/0x188
[130747.338851] vfs_read+0x94/0x150
[130747.339110] ksys_read+0x74/0xf0
If call ext4_ext_insert_extent failed but new extent already inserted, we just
update "ex->ee_len = orig_ex.ee_len", this will lead to extent overlap, then
cause bug on when cache extent.
in the extent tree can lead to triggering BUG_ON(lblk + len - 1 < lblk) in
ext4_es_cache_extent(). Can you ellaborate a bit more how this happens?
V1 patch Ted suggest me to fix length only when "err != -EROSFS". As if we don'tIf call ext4_ext_insert_extent failed don't update ex->ee_len with old value.I fail to see why EROFS is special here. Can you explain a bit please?
Maybe there will lead to block leak, but it can be fixed by fsck later.
After we fixed above issue with v2 patch, but we got the same issue.
ext4_split_extent_at:
{
......
err = ext4_ext_insert_extent(handle, inode, ppath, &newex, flags);
if (err == -ENOSPC && (EXT4_EXT_MAY_ZEROOUT & split_flag)) {
......
ext4_ext_try_to_merge(handle, inode, path, ex); ->step(1)
err = ext4_ext_dirty(handle, inode, path + path->p_depth); ->step(2)
if (err)
goto fix_extent_len;
......
}
......
fix_extent_len:
ex->ee_len = orig_ex.ee_len; ->step(3)
......
}
If step(1) have been merged, but step(2) dirty extent failed, then go to
fix_extent_len label to fix ex->ee_len with orig_ex.ee_len. But "ex" may not be
old one, will cause overwritten. Then will trigger the same issue as previous.
If step(2) failed, just return error, don't fix ex->ee_len with old value.
Signed-off-by: Ye Bin <yebin10@xxxxxxxxxx>
---
fs/ext4/extents.c | 13 +++++--------
1 file changed, 5 insertions(+), 8 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 77c84d6f1af6..d4aa24a09d8b 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3238,15 +3238,12 @@ static int ext4_split_extent_at(handle_t *handle,
ex->ee_len = cpu_to_le16(ee_len);
ext4_ext_try_to_merge(handle, inode, path, ex);
err = ext4_ext_dirty(handle, inode, path + path->p_depth);
- if (err)
- goto fix_extent_len;
-
- /* update extent status tree */
- err = ext4_zeroout_es(inode, &zero_ex);
-
- goto out;
- } else if (err)
+ if (!err)
+ /* update extent status tree */
+ err = ext4_zeroout_es(inode, &zero_ex);
+ } else if (err && err != -EROFS) {
goto fix_extent_len;Honza
+ }
out:
ext4_ext_show_leaf(inode, path);