Re: [PATCH v3 0/4] btrfs: fix balance NULL derefs and chunk/bg mapping verification
From: David Sterba
Date: Tue Apr 21 2026 - 00:21:41 EST
On Wed, Mar 25, 2026 at 08:43:35AM +0800, ZhengYuan Huang wrote:
> This series fixes three NULL dereferences in btrfs balance paths and the
> underlying mount-time verification bug that lets the corresponding
> chunk/block-group inconsistency go undetected.
>
> The balance crashes happen when metadata corruption leaves a chunk present
> in the chunk tree but without a corresponding block group in the in-memory
> block group cache. In that case, balance reaches code paths that call
> btrfs_lookup_block_group() and dereference the returned pointer without
> checking for NULL.
>
> The first three patches harden the affected balance paths:
> - patch 1 fixes chunk_usage_filter()
> - patch 2 fixes chunk_usage_range_filter()
> - patch 3 fixes btrfs_may_alloc_data_chunk()
>
> They are kept separate because the affected code was introduced by
> different commits, which should also make backporting easier, as
> suggested by Qu Wenruo.
>
> The fourth patch fixes the mount-time verification side. Based on David
> Sterba's feedback, it now explicitly relies on the mount-time context and
> uses a lockless traversal of mapping_tree. check_chunk_block_group_mappings()
> is supposed to verify that every chunk has a matching block group, but its
> current iteration starts with btrfs_find_chunk_map(fs_info, 0, 1). If no
> chunk contains logical address 0, the lookup returns NULL immediately and
> the loop exits without checking any chunk at all. As a result, the
> corrupted mapping can survive mount and only crash later when balance
> reaches it.
>
> This series makes btrfs reject the inconsistency earlier at mount time,
> and also hardens the balance paths so the corruption is reported as
> -EUCLEAN instead of triggering NULL dereferences.
>
> [CHANGELOG]
> v3:
> - added a new patch to fix the same missing-block-group NULL dereference
> in btrfs_may_alloc_data_chunk()
> - patch 1 and 2:
> - changed the bool return flow to explicit int error propagation
> - used ret2 for the nested filter return value in should_balance_chunk()
> - patch 4:
> - reworked the changelog based on David Sterba's feedback
> - clarified the mount-time context for the lockless mapping_tree traversal
Thanks for the v3 update, I've added the fixes to for-next. I've edited
the changelogs a bit in places where the explanations felt stating the
obvious, but otherwise the problem descrtiptions were good.