Re: [PATCH 03/20] ext4: fix double brelse() the buffer of the extents path

From: Baokun Li
Date: Tue Jul 30 2024 - 04:48:19 EST


On 2024/7/26 19:45, Ojaswin Mujoo wrote:
On Wed, Jul 10, 2024 at 12:06:37PM +0800, libaokun@xxxxxxxxxxxxxxx wrote:
From: Baokun Li <libaokun1@xxxxxxxxxx>

In ext4_ext_try_to_merge_up(), set path[1].p_bh to NULL after it has been
released, otherwise it may be released twice.

An example of what triggers this is as follows:

split2 map split1
|--------|-------|--------|

ext4_ext_map_blocks
ext4_ext_handle_unwritten_extents
ext4_split_convert_extents
// path->p_depth == 0
ext4_split_extent
// 1. do split1
ext4_split_extent_at
ext4_ext_insert_extent
ext4_ext_create_new_leaf
ext4_ext_grow_indepth
le16_add_cpu(&neh->eh_depth, 1)
ext4_find_extent
path->p_depth = 1
ext4_ext_try_to_merge
ext4_ext_try_to_merge_up
path->p_depth = 0
brelse(path[1].p_bh) ---> not set to NULL here
// 2. update path
ext4_find_extent
// 3. do split2
ext4_split_extent_at
ext4_ext_insert_extent
ext4_ext_create_new_leaf
ext4_ext_grow_indepth
le16_add_cpu(&neh->eh_depth, 1)
ext4_find_extent
path[0].p_bh = NULL;
path->p_depth = 1
read_extent_tree_block ---> return err
// path[1].p_bh is still the old value
ext4_free_ext_path
ext4_ext_drop_refs
// path->p_depth == 1
brelse(path[1].p_bh) ---> brelse a buffer twice
Hi Baokun,

If i'm not wrong, in this trace, we'll enter ext4_ext_insert_extent() with
gb_flags having EXT4_GET_BLOCKS_PRE_IO so we won't actually go for a
merge_up.

That being said, there seems to be a few places where we follow the
split-insert pattern and it might be possible that one of those call
sites might not be passing EXT4_GET_BLOCKS_PRE_IO and we'll the double
free issue you mentioned. I'll check and update if I see anything.
Hi Ojaswin,

You're right. I am very sorry for the confusion.

The trace here is wrong, this patch should actually be placed after the two
UAF patches. Here ext4_ext_try_to_merge() is called when trying zeroout in
ext4_split_extent_at(). It is called when trying zeroout with or without
EXT4_GET_BLOCKS_PRE_IO.The correct trace is as follows:

  split2    map    split1
|--------|-------|--------|

ext4_ext_map_blocks
 ext4_ext_handle_unwritten_extents
  ext4_split_convert_extents
   // path->p_depth == 0
   ext4_split_extent
     // 1. do split1
     ext4_split_extent_at
       |ext4_ext_insert_extent
       |  ext4_ext_create_new_leaf
       |    ext4_ext_grow_indepth
       |      le16_add_cpu(&neh->eh_depth, 1)
       |    ext4_find_extent
       |      // return -ENOMEM
       |// get error and try zeroout
       |path = ext4_find_extent
       |  path->p_depth = 1
       |ext4_ext_try_to_merge
       |  ext4_ext_try_to_merge_up
       |    path->p_depth = 0
       |    brelse(path[1].p_bh)  ---> not set to NULL here
       |// zeroout success
     // 2. update path
     ext4_find_extent
     // 3. do split2
     ext4_split_extent_at
       ext4_ext_insert_extent
         ext4_ext_create_new_leaf
           ext4_ext_grow_indepth
             le16_add_cpu(&neh->eh_depth, 1)
           ext4_find_extent
             path[0].p_bh = NULL;
             path->p_depth = 1
             read_extent_tree_block  ---> return err
             // path[1].p_bh is still the old value
 ext4_free_ext_path
  ext4_ext_drop_refs
   // path->p_depth == 1
   brelse(path[1].p_bh)  ---> brelse a buffer twice

I'll adjust the order of the patches and correct the trace in the next version.
On a separate note, isn't it a bit weird that we grow the tree indepth
(which includes allocation, marking buffer dirty etc) only to later
merge it up again and throwing all the changes we did while growing the
tree. Surely we could optimize this particular case somehow right?
Sorry that my trace misled you. It seems reasonable to try to merge extent
in error handling.
Finally got the following WARRNING when removing the buffer from lru:

============================================
VFS: brelse: Trying to free free buffer
WARNING: CPU: 2 PID: 72 at fs/buffer.c:1241 __brelse+0x58/0x90
CPU: 2 PID: 72 Comm: kworker/u19:1 Not tainted 6.9.0-dirty #716
RIP: 0010:__brelse+0x58/0x90
Call Trace:
<TASK>
__find_get_block+0x6e7/0x810
bdev_getblk+0x2b/0x480
__ext4_get_inode_loc+0x48a/0x1240
ext4_get_inode_loc+0xb2/0x150
ext4_reserve_inode_write+0xb7/0x230
__ext4_mark_inode_dirty+0x144/0x6a0
ext4_ext_insert_extent+0x9c8/0x3230
ext4_ext_map_blocks+0xf45/0x2dc0
ext4_map_blocks+0x724/0x1700
ext4_do_writepages+0x12d6/0x2a70
[...]
============================================

Fixes: ecb94f5fdf4b ("ext4: collapse a single extent tree block into the inode if possible")
Cc: stable@xxxxxxxxxx
Signed-off-by: Baokun Li <libaokun1@xxxxxxxxxx>
---
fs/ext4/extents.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 4d589d34b30e..657baf3991c1 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -1888,6 +1888,7 @@ static void ext4_ext_try_to_merge_up(handle_t *handle,
path[0].p_hdr->eh_max = cpu_to_le16(max_root);
brelse(path[1].p_bh);
+ path[1].p_bh = NULL;
Anyways, I agree that adding this here is the right thing to do:

Reviewed-by: Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx>

Thanks,
Ojaswin
Thanks for the review!
ext4_free_blocks(handle, inode, NULL, blk, 1,
EXT4_FREE_BLOCKS_METADATA | EXT4_FREE_BLOCKS_FORGET);
}
--
2.39.2

--
With Best Regards,
Baokun Li