Re: [PATCH] btrfs: fix data race when accessing the block_group's used field

From: David Sterba
Date: Wed Feb 26 2025 - 10:01:22 EST


On Tue, Feb 25, 2025 at 11:43:29AM +0000, Filipe Manana wrote:
> > > > > > > static inline u64 btrfs_block_group_used(struct btrfs_block_group *bg)
> > > > > > > {
> > > > > > > u64 ret;
> > > > > > >
> > > > > > > spin_lock(&bg->lock);
> > > > > > > ret = bg->used;
> > > > > > > spin_unlock(&bg->lock);
> > > > > > >
> > > > > > > return ret;
> > > > > > > }
> >
> > I understand that using lock to protect block_group->used
> > in discard.c file is feasible. In addition, I looked at the code
> > of block-group.c and found that locks have been added in
> > some places where block_group->used are used. , it
> > seems that it is not appropriate to call
> > btrfs_block_group_used again to obtain (because it will
> > cause deadlock).
>
> In places where we are reading it while holding the block group's
> spinlock, there's nothing that needs to be changed.
>
> Also we can't call it btrfs_block_group_used() since there's already
> an accessor function for struct btrfs_block_group_item with that name
> (defined through macros at accessors.h).
>
> I took a closer look at the cases in discard.c, and it's safe to use
> data_race() instead, even if load/store tearing happens or we get stale
> values, nothing harmful happens, only a few things can be done later or
> unnecessarily without side effects - like adding a non-empty block group
> to the list of unused block groups, which is fine since the we won't
> delete it the cleaner kthread in case it's not empty, or delay the discard.
>
> Either give it another name like btrfs_get_block_group_used() or directly
> use data_race() in discard.c - I don't like much either of them, the first
> because there's the similar named accessor for block group items, the
> second due to spreading data_race(), but I don't see any more elegant
> alternative.

Agreed, this looks like a resonable "solution". It's not really a bug,
we can live with data_race annotation, namely when it filters out known
cases so the code analysis tools may find the real problems.

> So a sample patch:
[...]

Looks good to me.