Re: [EXTERNAL] Re: [PATCH v2] hfs: prevent MDB and bitmap buffer_head aliasing

From: Sam Sun

Date: Mon Jun 01 2026 - 10:18:00 EST


On Sat, May 30, 2026 at 7:23 AM Viacheslav Dubeyko <vdubeyko@xxxxxxxxxx> wrote:
>
> On Fri, 2026-05-29 at 14:29 -0700, Viacheslav Dubeyko wrote:
> > On Fri, 2026-05-29 at 18:34 +0800, Yue Sun wrote:
> > > hfs_mdb_commit() writes the volume bitmap while HFS_SB(sb)->mdb_bh is
> > > locked. A crafted image can set drVBMSt so that the bitmap block resolves
> > > to the same buffer_head as the MDB. When writeback later calls
> > > lock_buffer() for that bitmap block, the task tries to lock mdb_bh again
> > > and self-deadlocks in __lock_buffer().
> > >
> > > Reject images whose volume bitmap starts at or before the MDB during
> > > mount. Also guard the bitmap writeback path itself: if the bitmap block
> > > would resolve to mdb_bh, force the filesystem read-only and stop bitmap
> > > writeback before taking the buffer lock. This keeps the deadlock fix in
> > > the MDB commit path and reuses the existing bitmap size/writeback logic.
> > >
> > > Reported-by: Yue Sun <samsun1006219@xxxxxxxxx>
> > > Closes: https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_all_CAEkJfYMB47v1yOWHB8q2dc8kf-3Duj-2DrLO-3D-2ByMyudwPguJ8Kd3jA-40mail.gmail.com_&d=DwIFaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=q5bIm4AXMzc8NJu1_RGmnQ2fMWKq4Y4RAkElvUgSs00&m=2ATrdgpvRdYnJfz2T53Ih3iQv6Y9wogU2Ba19prTYgchh1mh4KI7lo9TYLQnlu3X&s=S8XdWVW0SpxB05CfctrqCJtJEXqtWDfLNdtzPHIZAY8&e=
> > > Signed-off-by: Yue Sun <samsun1006219@xxxxxxxxx>
> > > ---
> > > Changes in v2:
> > > - Add a commit-time guard before locking bitmap buffer_heads.
> > > - Replace the mount-time byte-range check with a simple drVBMSt check.
> > > - Reuse the existing bitmap writeback size calculation.
> > >
> > > fs/hfs/mdb.c | 10 ++++++++++
> > > 1 file changed, 10 insertions(+)
> > >
> > > diff --git a/fs/hfs/mdb.c b/fs/hfs/mdb.c
> > > index a97cea35ca2e..53cd137892b5 100644
> > > --- a/fs/hfs/mdb.c
> > > +++ b/fs/hfs/mdb.c
> > > @@ -185,6 +185,11 @@ int hfs_mdb_get(struct super_block *sb)
> > > sb->s_flags |= SB_RDONLY;
> > > }
> > >
> > > + if (be16_to_cpu(mdb->drVBMSt) <= HFS_MDB_BLK) {
> >
> > Technically speaking, if we are trying to check the overlapping of volume bitmap
> > with the main MDB record, then we need to check the overlapping with alternative
> > MDB record, and with Catalog File and Extents Overflow File. However, it sounds
> > like we are trying to add some FSCK logic here. :)
> >
> > > + pr_err("volume bitmap overlaps MDB\n");
> >
> > This situation means volume corruption. It makes sense to recommend to run FSCK.
> >
> > > + return -EIO;
> >
> > This code error is wrong because the read operation was OK. But we have
> > corrupted volume. Even if we have overlapping of volume bitmap with MDB record,
> > then we cannot reject mount operation. We must mount in READ-ONLY mode because,
> > potentially, the rest of metadata could be completely OK. We simply cannot mount
> > in RW mode.
> >
> > > + }
> > > +
> > > /* TRY to get the alternate (backup) MDB. */
> > > sect = part_start + part_size - 2;
> > > bh = sb_bread512(sb, sect, mdb2);
> > > @@ -341,6 +346,11 @@ void hfs_mdb_commit(struct super_block *sb)
> > > size = (HFS_SB(sb)->fs_ablocks + 7) / 8;
> > > ptr = (u8 *)HFS_SB(sb)->bitmap;
> > > while (size) {
> > > + if (unlikely(block == HFS_SB(sb)->mdb_bh->b_blocknr)) {
> > > + pr_err("volume bitmap overlaps MDB, forcing read-only\n");
> > > + sb->s_flags |= SB_RDONLY;
> > > + break;
> > > + }
> >
> > At this point, we already wrote main MDB and alternative MDB to the volume.
> > Theoretically, it is possible to imagine that if size of volume bitmap is big
> > enough, then we could partially process the bitmap too. Probably, we need to
> > check the overlapping at the beginning of the method and reject the whole
> > superblock commit.
> >
> > Initial issue was the deadlock. Could we implement some check that buffer_heads
> > don't overlap before trying to lock? Does it make sense to you?
> >
>
> By the way, I can see the deadlock in hfs_mdb_commit() even for completely valid
> HFS volume during generic/013 test run. This issue takes place because folio
> lock related issue. I remember that somebody has sent the patch related to this
> issue but we haven't finished the discussion with some reasonable solution. I
> think we need to rework the locking scheme:
>
> diff --git a/fs/hfs/hfs_fs.h b/fs/hfs/hfs_fs.h
> index 97e8d1f96d6d..919798eda0f8 100644
> --- a/fs/hfs/hfs_fs.h
> +++ b/fs/hfs/hfs_fs.h
> @@ -64,6 +64,7 @@ struct hfs_inode_info {
> * The HFS-specific part of a Linux (struct super_block)
> */
> struct hfs_sb_info {
> + struct mutex mdb_lock; /* MDB operations lock */
> struct buffer_head *mdb_bh; /* The hfs_buffer
> holding the real
> superblock (aka VIB
> diff --git a/fs/hfs/mdb.c b/fs/hfs/mdb.c
> index a97cea35ca2e..94497c155d29 100644
> --- a/fs/hfs/mdb.c
> +++ b/fs/hfs/mdb.c
> @@ -291,9 +291,9 @@ void hfs_mdb_commit(struct super_block *sb)
> if (sb_rdonly(sb))
> return;
>
> - lock_buffer(HFS_SB(sb)->mdb_bh);
> if (test_and_clear_bit(HFS_FLG_MDB_DIRTY, &HFS_SB(sb)->flags)) {
> /* These parameters may have been modified, so write them back
> */
> + lock_buffer(HFS_SB(sb)->mdb_bh);
> mdb->drLsMod = hfs_mtime();
> mdb->drFreeBks = cpu_to_be16(HFS_SB(sb)->free_ablocks);
> mdb->drNxtCNID =
> @@ -304,6 +304,7 @@ void hfs_mdb_commit(struct super_block *sb)
> cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)-
> >file_count));
> mdb->drDirCnt =
> cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)-
> >folder_count));
> + unlock_buffer(HFS_SB(sb)->mdb_bh);
>
> /* write MDB to disk */
> mark_buffer_dirty(HFS_SB(sb)->mdb_bh);
> @@ -360,7 +361,6 @@ void hfs_mdb_commit(struct super_block *sb)
> size -= len;
> }
> }
> - unlock_buffer(HFS_SB(sb)->mdb_bh);
> }
>
> void hfs_mdb_close(struct super_block *sb)
> diff --git a/fs/hfs/super.c b/fs/hfs/super.c
> index a466c401f6bb..5d4caf3ddda6 100644
> --- a/fs/hfs/super.c
> +++ b/fs/hfs/super.c
> @@ -35,7 +35,11 @@ MODULE_LICENSE("GPL");
> static int hfs_sync_fs(struct super_block *sb, int wait)
> {
> is_hfs_cnid_counts_valid(sb);
> +
> + mutex_lock(&HFS_SB(sb)->mdb_lock);
> hfs_mdb_commit(sb);
> + mutex_unlock(&HFS_SB(sb)->mdb_lock);
> +
> return 0;
> }
>
> @@ -68,7 +72,9 @@ static void flush_mdb(struct work_struct *work)
>
> is_hfs_cnid_counts_valid(sb);
>
> + mutex_lock(&sbi->mdb_lock);
> hfs_mdb_commit(sb);
> + mutex_unlock(&sbi->mdb_lock);
> }
>
> void hfs_mark_mdb_dirty(struct super_block *sb)
> @@ -339,9 +345,13 @@ static int hfs_fill_super(struct super_block *sb, struct
> fs_context *fc)
> sb->s_op = &hfs_super_operations;
> sb->s_xattr = hfs_xattr_handlers;
> sb->s_flags |= SB_NOATIME | SB_NODIRATIME;
> + mutex_init(&sbi->mdb_lock);
> mutex_init(&sbi->bitmap_lock);
>
> + mutex_lock(&sbi->mdb_lock);
> res = hfs_mdb_get(sb);
> + mutex_unlock(&sbi->mdb_lock);
> +
> if (res) {
> if (!silent)
> pr_warn("can't find a HFS filesystem on dev %s\n",
>
> What do you think?
>
> Thanks,
> Slava.
>

Thanks for the detailed comments. I agree that my patch was too
narrow. It only handled the corrupted-layout case without fixing the
more general locking problem in hfs_mdb_commit(), including the case
you mentioned where a valid HFS volume can still deadlock during
xfstests.

I also agree with your main direction that the MDB buffer lock should
not be used as a transaction lock for the whole MDB commit. However,
one detail in the locking sketch worries me, though. After narrowing
the mdb_bh lock, the HFS_FLG_ALT_MDB_DIRTY path still modifies the
primary MDB buffer through hfs_inode_write_fork():
hfs_inode_write_fork(..., mdb->drXTExtRec, &mdb->drXTFlSize, NULL);
hfs_inode_write_fork(..., mdb->drCTExtRec, &mdb->drCTFlSize, NULL);

hfs_inode_write_fork() writes to the extent record and size pointers that
are passed to it, so these are still direct writes into mdb_bh->b_data.
The new HFS-level mdb_lock serializes HFS MDB commits, but generic
buffer writeback does not take that filesystem-private mutex. It only
synchronizes with the buffer_head lock. So I think these primary MDB
updates still need to be covered by lock_buffer(mdb_bh), even if
HFS_FLG_MDB_DIRTY itself is not set.

The approach I am considering is to split the fix into two parts:

1. Rework the locking in hfs_mdb_commit().
Add an HFS-level mdb_lock and take it inside hfs_mdb_commit(), so all
MDB commits are serialized at the HFS layer. Then narrow the mdb_bh
buffer lock so it only covers direct updates of the primary MDB
buffer, including the hfs_inode_write_fork() calls from the
HFS_FLG_ALT_MDB_DIRTY path. mark_buffer_dirty(mdb_bh) is done in that
short critical section. The alternate MDB sync and bitmap writeback
happen after mdb_bh has been unlocked.

2. Handle the corrupted volume-bitmap layout separately.
Instead of trying to do broader fsck-style validation, only check the
condition relevant to this report: whether the volume bitmap's HFS
512-byte sector range overlaps the main MDB sector. If this is found
during mount, warn and keep the filesystem read-only, rather than
returning -EIO. In hfs_mdb_commit(), check the same condition before
clearing dirty bits or doing any writeback, force read-only, and
return. I also keep a flag for this condition so a later remount-rw
attempt leaves the filesystem read-only as well.

Draft patches are attached to this email. What do you think?

Thanks,
Yue
--- draft patch 1: hfs: serialize MDB commits and narrow mdb_bh locking ---
diff --git a/fs/hfs/hfs_fs.h b/fs/hfs/hfs_fs.h
index ac0e83f77a0f..e4ed9071c498 100644
--- a/fs/hfs/hfs_fs.h
+++ b/fs/hfs/hfs_fs.h
@@ -124,6 +124,7 @@ struct hfs_sb_info {

int session, part;
struct nls_table *nls_io, *nls_disk;
+ struct mutex mdb_lock; /* serializes MDB updates */
struct mutex bitmap_lock;
unsigned long flags;
u16 blockoffset;
diff --git a/fs/hfs/mdb.c b/fs/hfs/mdb.c
index a97cea35ca2e..991013ae0c33 100644
--- a/fs/hfs/mdb.c
+++ b/fs/hfs/mdb.c
@@ -286,60 +286,74 @@ int hfs_mdb_get(struct super_block *sb)
*/
void hfs_mdb_commit(struct super_block *sb)
{
- struct hfs_mdb *mdb = HFS_SB(sb)->mdb;
+ struct hfs_sb_info *sbi = HFS_SB(sb);
+ struct hfs_mdb *mdb = sbi->mdb;
+ bool mdb_dirty, alt_dirty, bitmap_dirty;

if (sb_rdonly(sb))
return;

- lock_buffer(HFS_SB(sb)->mdb_bh);
- if (test_and_clear_bit(HFS_FLG_MDB_DIRTY, &HFS_SB(sb)->flags)) {
+ mutex_lock(&sbi->mdb_lock);
+ if (sb_rdonly(sb))
+ goto out;
+
+ mdb_dirty = test_and_clear_bit(HFS_FLG_MDB_DIRTY, &sbi->flags);
+ alt_dirty = test_and_clear_bit(HFS_FLG_ALT_MDB_DIRTY, &sbi->flags) &&
+ sbi->alt_mdb;
+ bitmap_dirty = test_and_clear_bit(HFS_FLG_BITMAP_DIRTY, &sbi->flags);
+
+ if (mdb_dirty || alt_dirty)
+ lock_buffer(sbi->mdb_bh);
+ if (mdb_dirty) {
/* These parameters may have been modified, so write them back */
mdb->drLsMod = hfs_mtime();
- mdb->drFreeBks = cpu_to_be16(HFS_SB(sb)->free_ablocks);
+ mdb->drFreeBks = cpu_to_be16(sbi->free_ablocks);
mdb->drNxtCNID =
- cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)->next_id));
- mdb->drNmFls = cpu_to_be16(HFS_SB(sb)->root_files);
- mdb->drNmRtDirs = cpu_to_be16(HFS_SB(sb)->root_dirs);
+ cpu_to_be32((u32)atomic64_read(&sbi->next_id));
+ mdb->drNmFls = cpu_to_be16(sbi->root_files);
+ mdb->drNmRtDirs = cpu_to_be16(sbi->root_dirs);
mdb->drFilCnt =
- cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)->file_count));
+ cpu_to_be32((u32)atomic64_read(&sbi->file_count));
mdb->drDirCnt =
- cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)->folder_count));
-
- /* write MDB to disk */
- mark_buffer_dirty(HFS_SB(sb)->mdb_bh);
+ cpu_to_be32((u32)atomic64_read(&sbi->folder_count));
}

/* write the backup MDB, not returning until it is written.
* we only do this when either the catalog or extents overflow
* files grow. */
- if (test_and_clear_bit(HFS_FLG_ALT_MDB_DIRTY, &HFS_SB(sb)->flags) &&
- HFS_SB(sb)->alt_mdb) {
- hfs_inode_write_fork(HFS_SB(sb)->ext_tree->inode, mdb->drXTExtRec,
+ if (alt_dirty) {
+ hfs_inode_write_fork(sbi->ext_tree->inode, mdb->drXTExtRec,
&mdb->drXTFlSize, NULL);
- hfs_inode_write_fork(HFS_SB(sb)->cat_tree->inode, mdb->drCTExtRec,
+ hfs_inode_write_fork(sbi->cat_tree->inode, mdb->drCTExtRec,
&mdb->drCTFlSize, NULL);
+ }
+ if (mdb_dirty || alt_dirty) {
+ mark_buffer_dirty(sbi->mdb_bh);
+ unlock_buffer(sbi->mdb_bh);
+ }

- lock_buffer(HFS_SB(sb)->alt_mdb_bh);
- memcpy(HFS_SB(sb)->alt_mdb, HFS_SB(sb)->mdb, HFS_SECTOR_SIZE);
- HFS_SB(sb)->alt_mdb->drAtrb |= cpu_to_be16(HFS_SB_ATTRIB_UNMNT);
- HFS_SB(sb)->alt_mdb->drAtrb &= cpu_to_be16(~HFS_SB_ATTRIB_INCNSTNT);
- unlock_buffer(HFS_SB(sb)->alt_mdb_bh);
+ if (alt_dirty) {
+ lock_buffer(sbi->alt_mdb_bh);
+ memcpy(sbi->alt_mdb, sbi->mdb, HFS_SECTOR_SIZE);
+ sbi->alt_mdb->drAtrb |= cpu_to_be16(HFS_SB_ATTRIB_UNMNT);
+ sbi->alt_mdb->drAtrb &= cpu_to_be16(~HFS_SB_ATTRIB_INCNSTNT);
+ unlock_buffer(sbi->alt_mdb_bh);

- mark_buffer_dirty(HFS_SB(sb)->alt_mdb_bh);
- sync_dirty_buffer(HFS_SB(sb)->alt_mdb_bh);
+ mark_buffer_dirty(sbi->alt_mdb_bh);
+ sync_dirty_buffer(sbi->alt_mdb_bh);
}

- if (test_and_clear_bit(HFS_FLG_BITMAP_DIRTY, &HFS_SB(sb)->flags)) {
+ if (bitmap_dirty) {
struct buffer_head *bh;
sector_t block;
char *ptr;
int off, size, len;

- block = be16_to_cpu(HFS_SB(sb)->mdb->drVBMSt) + HFS_SB(sb)->part_start;
+ block = be16_to_cpu(sbi->mdb->drVBMSt) + sbi->part_start;
off = (block << HFS_SECTOR_SIZE_BITS) & (sb->s_blocksize - 1);
block >>= sb->s_blocksize_bits - HFS_SECTOR_SIZE_BITS;
- size = (HFS_SB(sb)->fs_ablocks + 7) / 8;
- ptr = (u8 *)HFS_SB(sb)->bitmap;
+ size = (sbi->fs_ablocks + 7) / 8;
+ ptr = (u8 *)sbi->bitmap;
while (size) {
bh = sb_bread(sb, block);
if (!bh) {
@@ -360,7 +374,9 @@ void hfs_mdb_commit(struct super_block *sb)
size -= len;
}
}
- unlock_buffer(HFS_SB(sb)->mdb_bh);
+
+out:
+ mutex_unlock(&sbi->mdb_lock);
}

void hfs_mdb_close(struct super_block *sb)
diff --git a/fs/hfs/super.c b/fs/hfs/super.c
index a4f2a2bfa6d3..60a93c31b3ff 100644
--- a/fs/hfs/super.c
+++ b/fs/hfs/super.c
@@ -339,6 +339,7 @@ static int hfs_fill_super(struct super_block *sb, struct fs_context *fc)
sb->s_op = &hfs_super_operations;
sb->s_xattr = hfs_xattr_handlers;
sb->s_flags |= SB_NODIRATIME;
+ mutex_init(&sbi->mdb_lock);
mutex_init(&sbi->bitmap_lock);

res = hfs_mdb_get(sb);

--- draft patch 2: hfs: detect volume bitmap overlap with MDB ---
diff --git a/fs/hfs/hfs_fs.h b/fs/hfs/hfs_fs.h
index e4ed9071c498..df6ed1298151 100644
--- a/fs/hfs/hfs_fs.h
+++ b/fs/hfs/hfs_fs.h
@@ -138,6 +138,7 @@ struct hfs_sb_info {
#define HFS_FLG_BITMAP_DIRTY 0
#define HFS_FLG_MDB_DIRTY 1
#define HFS_FLG_ALT_MDB_DIRTY 2
+#define HFS_FLG_VBM_OVERLAP 3

/* bitmap.c */
extern u32 hfs_vbm_search_free(struct super_block *sb, u32 goal, u32 *num_bits);
diff --git a/fs/hfs/mdb.c b/fs/hfs/mdb.c
index 991013ae0c33..4b6f1df5e4a8 100644
--- a/fs/hfs/mdb.c
+++ b/fs/hfs/mdb.c
@@ -85,6 +85,24 @@ bool is_hfs_cnid_counts_valid(struct super_block *sb)
return !corrupted;
}

+static bool hfs_vbm_overlaps_mdb(struct super_block *sb)
+{
+ struct hfs_sb_info *sbi = HFS_SB(sb);
+ sector_t mdb_start, vbm_start, vbm_end;
+ u32 vbm_bytes, vbm_sectors;
+
+ vbm_bytes = (sbi->fs_ablocks + 7) / 8;
+ if (!vbm_bytes)
+ return false;
+
+ mdb_start = sbi->part_start + HFS_MDB_BLK;
+ vbm_start = sbi->part_start + be16_to_cpu(sbi->mdb->drVBMSt);
+ vbm_sectors = DIV_ROUND_UP(vbm_bytes, HFS_SECTOR_SIZE);
+ vbm_end = vbm_start + vbm_sectors;
+
+ return vbm_start < mdb_start + 1 && mdb_start < vbm_end;
+}
+
/*
* hfs_mdb_get()
*
@@ -185,6 +203,12 @@ int hfs_mdb_get(struct super_block *sb)
sb->s_flags |= SB_RDONLY;
}

+ if (hfs_vbm_overlaps_mdb(sb)) {
+ pr_warn("volume bitmap overlaps MDB, running fsck.hfs is recommended. Mounting read-only.\n");
+ set_bit(HFS_FLG_VBM_OVERLAP, &HFS_SB(sb)->flags);
+ sb->s_flags |= SB_RDONLY;
+ }
+
/* TRY to get the alternate (backup) MDB. */
sect = part_start + part_size - 2;
bh = sb_bread512(sb, sect, mdb2);
@@ -296,6 +320,13 @@ void hfs_mdb_commit(struct super_block *sb)
mutex_lock(&sbi->mdb_lock);
if (sb_rdonly(sb))
goto out;
+ if (test_bit(HFS_FLG_VBM_OVERLAP, &sbi->flags) ||
+ hfs_vbm_overlaps_mdb(sb)) {
+ pr_err("volume bitmap overlaps MDB, forcing read-only\n");
+ set_bit(HFS_FLG_VBM_OVERLAP, &sbi->flags);
+ sb->s_flags |= SB_RDONLY;
+ goto out;
+ }

mdb_dirty = test_and_clear_bit(HFS_FLG_MDB_DIRTY, &sbi->flags);
alt_dirty = test_and_clear_bit(HFS_FLG_ALT_MDB_DIRTY, &sbi->flags) &&
diff --git a/fs/hfs/super.c b/fs/hfs/super.c
index 60a93c31b3ff..81de9607ab24 100644
--- a/fs/hfs/super.c
+++ b/fs/hfs/super.c
@@ -133,6 +133,10 @@ static int hfs_reconfigure(struct fs_context *fc)
pr_warn("filesystem is marked locked, leaving read-only.\n");
sb->s_flags |= SB_RDONLY;
fc->sb_flags |= SB_RDONLY;
+ } else if (test_bit(HFS_FLG_VBM_OVERLAP, &HFS_SB(sb)->flags)) {
+ pr_warn("volume bitmap overlaps MDB, running fsck.hfs is recommended. leaving read-only.\n");
+ sb->s_flags |= SB_RDONLY;
+ fc->sb_flags |= SB_RDONLY;
}
}
return 0;