Re: [PATCH v3] hfs: avoid deadlock in hfs_mdb_commit()

From: Viacheslav Dubeyko

Date: Tue Jun 16 2026 - 02:20:09 EST

On Sat, 2026-06-13 at 13:47 +0800, Yue Sun wrote:
> From: Sam Sun <samsun1006219@xxxxxxxxx>
>
> hfs_mdb_commit() currently uses the primary MDB buffer lock as a
> broad
> commit lock. It keeps HFS_SB(sb)->mdb_bh locked while it may also
> write
> the alternate MDB and the volume bitmap.
>
> This can deadlock. A corrupted image can set drVBMSt so that the
> bitmap
> block maps to the same buffer_head as the primary MDB. The bitmap
> writeback path then calls lock_buffer() on the same buffer_head while
> mdb_bh is already locked by hfs_mdb_commit(). A similar locking
> problem
> can also happen on valid images when hfs_mdb_commit() performs I/O
> while
> mdb_bh remains locked.
>
> Do not use mdb_bh as the HFS commit serialization lock. Add an HFS-
> level
> mdb_lock and take it around hfs_mdb_commit() callers. Then keep
> mdb_bh
> locked only while updating the primary MDB buffer and marking it
> dirty.
>
> The catalog/extents fork records and sizes are fields in the primary
> MDB,
> so update them in the primary MDB write section, not in the alternate
> MDB
> section. The alternate MDB sync and bitmap writeback then run after
> mdb_bh has been unlocked.
>
> HFS_FLG_ALT_MDB_DIRTY is set together with HFS_FLG_MDB_DIRTY when the
> catalog or extents overflow files grow, so the primary MDB fork
> records
> are refreshed before the alternate MDB copies the primary MDB
> contents.
>
> Reported-by: Sam Sun <samsun1006219@xxxxxxxxx>
> Closes:
> https://lore.kernel.org/all/CAEkJfYMB47v1yOWHB8q2dc8kf=uj-rLO=+yMyudwPguJ8Kd3jA@xxxxxxxxxxxxxx/
> Suggested-by: Viacheslav Dubeyko <slava@xxxxxxxxxxx>
> Signed-off-by: Sam Sun <samsun1006219@xxxxxxxxx>
> ---
> Changes in v3:
> - Follow Viacheslav's suggestion to move hfs_inode_write_fork() into
> the primary MDB write section.
> - Rework the fix as MDB commit locking instead of corrupted-layout
> validation.
> - Add mdb_lock to serialize HFS MDB commits.
> - Keep mdb_bh locked only while updating the primary MDB buffer.
> - Leave alternate MDB sync and bitmap writeback outside the mdb_bh
> lock.
>
> Changes in v2:
> - Added a commit-time guard before locking volume bitmap
> buffer_heads.
> - Replaced the v1 bitmap byte-range check with a simple drVBMSt
> check.
> - Reused the existing bitmap writeback size calculation.
>
> v1:
> - Initial mount-time check rejecting volume bitmap/MDB aliasing.
>
> fs/hfs/hfs_fs.h | 1 +
> fs/hfs/mdb.c    | 15 ++++++++-------
> fs/hfs/super.c | 9 +++++++++
> 3 files changed, 18 insertions(+), 7 deletions(-)
>
> diff --git a/fs/hfs/hfs_fs.h b/fs/hfs/hfs_fs.h
> index ac0e83f77a0f..00651f42ba38 100644
> --- a/fs/hfs/hfs_fs.h
> +++ b/fs/hfs/hfs_fs.h
> @@ -66,6 +66,7 @@ struct hfs_inode_info {
> * The HFS-specific part of a Linux (struct super_block)
> */
> struct hfs_sb_info {
> + struct mutex mdb_lock; /* MDB operations
> lock */
> struct buffer_head *mdb_bh; /* The hfs_buffer
>    holding the real
>    superblock (aka
> VIB
> diff --git a/fs/hfs/mdb.c b/fs/hfs/mdb.c
> index a97cea35ca2e..c2f4bda6fd00 100644
> --- a/fs/hfs/mdb.c
> +++ b/fs/hfs/mdb.c
> @@ -291,8 +291,9 @@ void hfs_mdb_commit(struct super_block *sb)
> if (sb_rdonly(sb))
> return;
>
> - lock_buffer(HFS_SB(sb)->mdb_bh);
> if (test_and_clear_bit(HFS_FLG_MDB_DIRTY, &HFS_SB(sb)-
> >flags)) {
> + lock_buffer(HFS_SB(sb)->mdb_bh);
> +
> /* These parameters may have been modified, so write
> them back */
> mdb->drLsMod = hfs_mtime();
> mdb->drFreeBks = cpu_to_be16(HFS_SB(sb)-
> >free_ablocks);
> @@ -305,8 +306,14 @@ void hfs_mdb_commit(struct super_block *sb)
> mdb->drDirCnt =
> cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)-
> >folder_count));
>
> + hfs_inode_write_fork(HFS_SB(sb)->ext_tree->inode,
> mdb->drXTExtRec,
> +      &mdb->drXTFlSize, NULL);
> + hfs_inode_write_fork(HFS_SB(sb)->cat_tree->inode,
> mdb->drCTExtRec,
> +      &mdb->drCTFlSize, NULL);
> +
> /* write MDB to disk */
> mark_buffer_dirty(HFS_SB(sb)->mdb_bh);
> + unlock_buffer(HFS_SB(sb)->mdb_bh);
> }
>
> /* write the backup MDB, not returning until it is written.
> @@ -314,11 +321,6 @@ void hfs_mdb_commit(struct super_block *sb)
> * files grow. */
> if (test_and_clear_bit(HFS_FLG_ALT_MDB_DIRTY, &HFS_SB(sb)-
> >flags) &&
>     HFS_SB(sb)->alt_mdb) {
> - hfs_inode_write_fork(HFS_SB(sb)->ext_tree->inode,
> mdb->drXTExtRec,
> -      &mdb->drXTFlSize, NULL);
> - hfs_inode_write_fork(HFS_SB(sb)->cat_tree->inode,
> mdb->drCTExtRec,
> -      &mdb->drCTFlSize, NULL);
> -
> lock_buffer(HFS_SB(sb)->alt_mdb_bh);
> memcpy(HFS_SB(sb)->alt_mdb, HFS_SB(sb)->mdb,
> HFS_SECTOR_SIZE);
> HFS_SB(sb)->alt_mdb->drAtrb |=
> cpu_to_be16(HFS_SB_ATTRIB_UNMNT);
> @@ -360,7 +362,6 @@ void hfs_mdb_commit(struct super_block *sb)
> size -= len;
> }
> }
> - unlock_buffer(HFS_SB(sb)->mdb_bh);
> }
>
> void hfs_mdb_close(struct super_block *sb)
> diff --git a/fs/hfs/super.c b/fs/hfs/super.c
> index a4f2a2bfa6d3..06ab904b3a42 100644
> --- a/fs/hfs/super.c
> +++ b/fs/hfs/super.c
> @@ -35,7 +35,11 @@ MODULE_LICENSE("GPL");
> static int hfs_sync_fs(struct super_block *sb, int wait)
> {
> is_hfs_cnid_counts_valid(sb);

I think that is_hfs_cnid_counts_valid() is better to check under lock.

> +
> + mutex_lock(&HFS_SB(sb)->mdb_lock);
> hfs_mdb_commit(sb);
> + mutex_unlock(&HFS_SB(sb)->mdb_lock);
> +
> return 0;
> }
>
> @@ -68,7 +72,9 @@ static void flush_mdb(struct work_struct *work)
>
> is_hfs_cnid_counts_valid(sb);
>
> + mutex_lock(&sbi->mdb_lock);
> hfs_mdb_commit(sb);
> + mutex_unlock(&sbi->mdb_lock);
> }
>
> void hfs_mark_mdb_dirty(struct super_block *sb)
> @@ -339,9 +345,12 @@ static int hfs_fill_super(struct super_block
> *sb, struct fs_context *fc)
> sb->s_op = &hfs_super_operations;
> sb->s_xattr = hfs_xattr_handlers;
> sb->s_flags |= SB_NODIRATIME;
> + mutex_init(&sbi->mdb_lock);
> mutex_init(&sbi->bitmap_lock);
>
> + mutex_lock(&sbi->mdb_lock);
> res = hfs_mdb_get(sb);
> + mutex_unlock(&sbi->mdb_lock);
> if (res) {
> if (!silent)
> pr_warn("can't find a HFS filesystem on dev
> %s\n",

I think we cannot completely eliminate the potential deadlock or race
condition until introduction of copy of MDB record in memory but not
accessing it in buffer head. Because, hfs_mdb_commit() uses the primary
MDB record's values during alternative MDB record and volume bitmap
processing. The fix looks much cleaner now. But we need to add more
functionality if we really would like to eliminate the issue.

Thanks,
Slava.