Re: [PATCH 1/4] ext4: make ext4_es_cache_extent() support overwrite existing extents
From: Jan Kara
Date: Tue Nov 11 2025 - 05:41:04 EST
Hi!
On Thu 06-11-25 21:02:35, Zhang Yi wrote:
> On 11/6/2025 5:15 PM, Jan Kara wrote:
> > On Fri 31-10-25 14:29:02, Zhang Yi wrote:
> >> From: Zhang Yi <yi.zhang@xxxxxxxxxx>
> >>
> >> Currently, ext4_es_cache_extent() is used to load extents into the
> >> extent status tree when reading on-disk extent blocks. Since it may be
> >> called while moving or modifying the extent tree, so it does not
> >> overwrite existing extents in the extent status tree and is only used
> >> for the initial loading.
> >>
> >> There are many other places in ext4 where on-disk extents are inserted
> >> into the extent status tree, such as in ext4_map_query_blocks().
> >> Currently, they call ext4_es_insert_extent() to perform the insertion,
> >> but they don't modify the extents, so ext4_es_cache_extent() would be a
> >> more appropriate choice. However, when ext4_map_query_blocks() inserts
> >> an extent, it may overwrite a short existing extent of the same type.
> >> Therefore, to prepare for the replacements, we need to extend
> >> ext4_es_cache_extent() to allow it to overwrite existing extents with
> >> the same type.
> >>
> >> In addition, since cached extents can be more lenient than the extents
> >> they modify and do not involve modifying reserved blocks, it is not
> >> necessary to ensure that the insertion operation succeeds as strictly as
> >> in the ext4_es_insert_extent() function.
> >>
> >> Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx>
> >
> > Thanks for writing this series! I think we can actually simplify things
> > event further. Extent status tree operations can be divided into three
> > groups:
> > 1) Lookups in es tree - protected only by i_es_lock.
> > 2) Caching of on-disk state into es tree - protected by i_es_lock and
> > i_data_sem (at least in read mode).
> > 3) Modification of existing state - protected by i_es_lock and i_data_sem
> > in write mode.
>
> Yeah.
>
> >
> > Now because 2) has exclusion vs 3) due to i_data_sem, the observation is
> > that 2) should never see a real conflict - i.e., all intersecting entries
> > in es tree have the same status, otherwise this is a bug.
>
> While I was debugging, I observed two exceptions here.
>
> A. The first exceptions is about the delay extent. Since there is no actual
> extent present in the extent tree on the disk, if a delayed extent
> already exists in the extent status tree and someone calls
> ext4_find_extent()->ext4_cache_extents() to cache an extent at the same
> location, then a status mismatch will occur (attempting to replace
> the delayed extent with a hole). This is not a bug.
> B. I also observed that ext4_find_extent()->ext4_cache_extents() is called
> during splitting and conversion between unwritten and written states (in
> most scenarios, EXT4_EX_NOCACHE is not added). However, because the
> process is in an intermediate state of handling extents, there can be
> cases where the status do not match. I did not analyze this scenario in
> detail, but since ext4_es_insert_extent() is called at the end of the
> processing to ensure the final state is correct, I don't think this is a
> practical issue either.
Thanks for bringing this up. I didn't think about these two cases. As for
case A that is easy to deal with as you write below. A hole insertion can
be deemed compatible with existing delalloc extent.
Case B is more difficult and I think I need to better understand the
details there to decide what to do. Only extent splitting (as it happens
e.g. with EXT4_GET_BLOCKS_PRE_IO) should keep extents in the extent tree and
extent status tree compatible. So it has to be something like
EXT4_GET_BLOCKS_CONVERT case. There indeed after we call
ext4_ext_mark_initialized() we have initialized extent on disk but in
extent status tree it is still as unwritten. But I just didn't find a place
in the extent conversion path that would modify extent state on disk and
then call ext4_find_extent(). Can you perhaps share a stacktrace where the
extent incompatibility was hit from ext4_cache_extents()? Thanks!
Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR