Re: [PATCH] btrfs: wait for in-flight readahead BIOs on open_ctree() error

From: Teng Liu

Date: Mon Mar 30 2026 - 14:01:04 EST


On 2026-03-30 08:51, Qu Wenruo wrote:
>
>
> 在 2026/3/30 08:36, Qu Wenruo 写道:
> >
> >
> > Even you wait for all bios, it can still cause problems.
> >
> > As the bio counter is only for btrfs bio layer, we still have
> > btrfs_bio::end_io called after btrfs_bio_counter_dec().
> >
> > And if the full fs_info has been freed, then at end_bbio_meta_read(), we
> > can still have problems as btrfs_validate_extent_buffer() will access eb
> > (bbio->private) and fs_info (eb->fs_info), which triggers use after
> > free.
> >
> > So using that bio counter is not going to solve all problems, but only
> > reducing the race window thus masking the problem.
> >
> >
> > The following ideas come up to me, but neither seems as simple as your
> > current one:
> >
> > 1) Introduce a dedicated counter for metadata readahead/reads
> >    This seems to be the simplest one among all.
> >    But the only usage is only the error handling, thus may not be
> >    worthy.
> >
> > 2) Disable metadata readahead during open_ctree()
> >    Which will delay the mount, especially for large extent tree without
> >    bgt feature.
> >
> > 3) Use buffer_tree xarray to iterate through all ebs
> >    Since this is only for error handling of open_ctree(), we're fine to
> >    do the full xarray iteration, and wait for any eb that has
> >    EXTENT_BUFFER_READING flag.
> >
> >    The problem is, we do not have a dedicated tag like
> >    PAGECACHE_TAG_(TOWRITE|DIRTY) to easily catch all dirty/writeback
> >    ebs.
> >    So the only option is to go through each eb and check their flags.
> >
> >    I think this is the one with minimal impact, but may cause much
> >    longer runtime during this error handling path.
> >
> > My personal preference is option 3).
>
> Or the 4th one, which is only an idea and I haven't yet verified:
>
> 4) Handle error from invalidate_inode_pages2()
> Currently we just call invalidate_inode_pages2() on btree inode and
> expect it to return 0.
>
> But if there is still an eb reading pending, it will make that
> function to return -EBUSY, as try_release_extent_buffer() will
> find a eb whose refs is not 0, and refuse the release that eb which
> belongs to a folio.
>
> That should be a good indicator of any pending metadata reads.
>
> So if that invalidate_inode_pages2() returned -EBUSY, we should wait
> retry until it returns 0.
>
>

Thanks! Yes, it makes sense, simply waiting on the bio counter doesnt
fix the problem here.

Among the options, I prefer option 3. Although it may be slower, but it
only happens in mount failure path so extra cost seems acceptable.

I am quite new to btrfs codebase so I dont know whether
`invalidate_inode_pages2()` would be a reliable solution so maybe I
should start with option 3?