Re: [PATCH] btrfs: disk-io: reject misaligned tree blocks in btree_csum_one_bio

From: David Sterba

Date: Tue Mar 31 2026 - 20:07:55 EST


On Wed, Mar 25, 2026 at 06:04:11PM +0800, ZhengYuan Huang wrote:
> [BUG]
> Running btrfs balance on a corrupt image can trigger a GPF, with KASAN
> reporting a wild memory access:
>
> BTRFS warning: tree block not nodesize aligned, start 6179131392 nodesize 16384, can be resolved by a full metadata balance
> Oops: general protection fault, probably for non-canonical address 0xe0009d1000000052: 0000 [#1] SMP KASAN NOPTI
> KASAN: maybe wild-memory-access in range [0x0005088000000290-0x0005088000000297]
> Hardware name: QEMU Ubuntu 24.04 PC v2, BIOS 1.16.3-debian-1.16.3-2
> RIP: 0010:get_unaligned_le64 include/linux/unaligned.h:28 [inline]
> RIP: 0010:btrfs_header_bytenr fs/btrfs/accessors.h:647 [inline]
> RIP: 0010:btree_csum_one_bio+0x175/0xfe0 fs/btrfs/disk-io.c:263
> Call Trace:
> <TASK>
> btrfs_bio_csum fs/btrfs/bio.c:511 [inline]
> btrfs_submit_chunk+0x138d/0x1750 fs/btrfs/bio.c:744
> btrfs_submit_bbio+0x20/0x40 fs/btrfs/bio.c:814
> write_one_eb+0x9ea/0xd30 fs/btrfs/extent_io.c:2239
> btree_write_cache_pages+0x836/0xdc0 fs/btrfs/extent_io.c:2342
> btree_writepages+0x163/0x1c0 fs/btrfs/disk-io.c:512
> do_writepages+0x255/0x5c0 mm/page-writeback.c:2604
> filemap_fdatawrite_wbc mm/filemap.c:389 [inline]
> filemap_fdatawrite_wbc+0xf2/0x150 mm/filemap.c:379
> __filemap_fdatawrite_range+0xd2/0x120 mm/filemap.c:422
> filemap_fdatawrite_range+0x2f/0x50 mm/filemap.c:440
> btrfs_write_marked_extents+0x13c/0x2d0 fs/btrfs/transaction.c:1157
> btrfs_write_and_wait_transaction+0xe5/0x250 fs/btrfs/transaction.c:1264
> btrfs_commit_transaction+0x28af/0x3d90 fs/btrfs/transaction.c:2533
> insert_balance_item.isra.0+0x392/0x3f0 fs/btrfs/volumes.c:3712
> btrfs_balance+0x1021/0x42b0 fs/btrfs/volumes.c:4582
> btrfs_ioctl_balance fs/btrfs/ioctl.c:3577 [inline]
> btrfs_ioctl+0x25cf/0x5b90 fs/btrfs/ioctl.c:5313
> ...
>
> [CAUSE]
> The corrupt image contains a tree block whose start address (6179131392)
> is page-aligned (4 KiB) but NOT nodesize-aligned (16 KiB):
>
> 6179131392 % 16384 == 4096

While you say it's a corrupted image it feels like it was crafted to
have such offset. The warning is from 6d3a61945b0088 ("btrfs: warn on
tree blocks which are not nodesize aligned") and it tries to catch
problems of misaligned ebs.

As we'll be moving to the large folios eventually such misaligned blocks
will become a hard problem. So this should answer if this should be a
warning or an error.

As the commit and error message suggests to run balance to fix the
alignment problem I see that this should be somehow fixed if the crash
happens inside balance. On the other hand, the misalignment should not
happen at all.

As we try to be cautious about recognizing old filesystems with
potential problems we also have to stop at some point if it blocks a new
feature. The grace period is IMO long enough.

If you have reprocued the problem by normal operations then we should
look for the solution to prevent it. If it's from a crafted image that
basically creates a valid image, shifts a block to be come misaligned
and otherwise valid then I suggest to turn the warning to error and
reject the filesystem as early as possible.

> When alloc_extent_buffer() is called for such a block,
> check_eb_alignment() detects the nodesize misalignment, but only emits
> a one-time btrfs_warn() and returns false without failing the
> allocation. This allows the extent buffer to be created with a
> misaligned start.
>
> Later, during transaction commit triggered by balance, write_one_eb()
> submits the dirty extent buffer for writeback, and
> btree_csum_one_bio() is called to checksum it before I/O submission.
> That path calls btrfs_header_bytenr(eb), which expands via
> BTRFS_SETGET_HEADER_FUNCS to:
>
> folio_address(eb->folios[0]) + offset_in_page(eb->start)
>
> With a nodesize-misaligned start, eb->folios[0] does not correspond to
> a valid direct-mapped kernel address. folio_address() returns the
> garbage value 0x0005088000000260, and dereferencing +0x30 (the bytenr
> field offset in struct btrfs_header) triggers the GPF.
>
> [FIX]
> Add a WARN_ON_ONCE() nodesize alignment check at the beginning of
> btree_csum_one_bio() and return -EIO for misaligned tree blocks.
>
> btree_csum_one_bio() already guards against corrupted extent buffer
> state on the checksum path, and it also revalidates metadata on the
> write path. The alignment check follows that pattern and must happen
> before the first access to eb->folios[] via btrfs_header_bytenr().
>
> Fixes: 6d3a61945b00 ("btrfs: warn on tree blocks which are not nodesize aligned")
> Signed-off-by: ZhengYuan Huang <gality369@xxxxxxxxx>
> ---
> An alternative fix of promoting check_eb_alignment() from warn to error
> would prevent the misaligned eb from being created at all, but would
> break mount and repair workflows: users need to be able to read and
> inspect a filesystem containing legacy misaligned tree blocks in order
> to run "btrfs balance -m" and correct the alignment.

While I agree with that I think we should start rejecting such
filesystems because of the large folio support and because we hopefully
have spent the grace period without new reports and incidents.

If you have a crafted image, and possibly a minimal one, I can add it to
the btrfs-progs fuzzed images so it can be verified as part of the test
suite.