Re: [PATCH v2] hfs: prevent MDB and bitmap buffer_head aliasing

From: Viacheslav Dubeyko

Date: Thu Jun 04 2026 - 19:53:26 EST

On Thu, 2026-06-04 at 14:25 -0700, Viacheslav Dubeyko wrote:
> On Fri, 2026-06-05 at 02:02 +0800, Yue Sun wrote:
> > On Thu, Jun 4, 2026 at 12:25 AM <slava@xxxxxxxxxxx> wrote:
> > >
> > > Could you please take a deeper look into my diff that I've shared
> > > with
> > > you before? I don't see the point to review your suggestion if
> > > you
> > > completely ignored my diff. From my point of view you suggested
> > > completely the same approach but in more complicated manner.
> >
> > You are right. I am sorry for the confusion.
> >
> > I should have looked more carefully at your diff and based my reply
> > on
> > it. Adding an HFS-level MDB mutex and avoiding holding mdb_bh
> > across
> > the
> > whole hfs_mdb_commit() path is the right direction. My previous
> > reply
> > mixed that with extra cleanups and a separate corrupted-layout
> > check,
> > which made it look like I was proposing a different approach. That
> > was
> > my mistake.
> >
> > I am not very familiar with HFS internals, so please treat the
> > following
> > only as a small suggestion. After re-reading your diff, the only
> > detail
> > I am still unsure about is the HFS_FLG_ALT_MDB_DIRTY path. In that
> > path,
> > hfs_inode_write_fork() gets MDB fields as output buffers:
> >
> >         hfs_inode_write_fork(..., mdb->drXTExtRec,
> >                              &mdb->drXTFlSize, NULL);
> >         hfs_inode_write_fork(..., mdb->drCTExtRec,
> >                              &mdb->drCTFlSize, NULL);
>
>
> I've started to think that, maybe, the location of these
> hfs_inode_write_fork() calls is not correct. We are trying to save
> the
> current size of Catalog File and Extents Overflow File. OK, we need
> to
> update this if size is changed. But we can save the sizes during
> every
> call of hfs_mdb_commit(). It's logically incorrect to call these
> methods in the alternative MDB section because it is the content of
> primary MDB. I suggest to move these two hfs_inode_write_fork() to
> primary MDB writing section:
>
> lock_buffer(HFS_SB(sb)->mdb_bh);
> /* These parameters may have been modified, so write
> them back */
> mdb->drLsMod = hfs_mtime();
> mdb->drFreeBks = cpu_to_be16(HFS_SB(sb)-
> > free_ablocks);
> mdb->drNxtCNID =
> cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)-
> > next_id));
> mdb->drNmFls = cpu_to_be16(HFS_SB(sb)->root_files);
> mdb->drNmRtDirs = cpu_to_be16(HFS_SB(sb)-
> >root_dirs);
> mdb->drFilCnt =
> cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)-
> > file_count));
> mdb->drDirCnt =
> cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)-
> > folder_count));
>
> hfs_inode_write_fork(HFS_SB(sb)->ext_tree->inode,
> mdb-
> > drXTExtRec,
>      &mdb->drXTFlSize, NULL);
> hfs_inode_write_fork(HFS_SB(sb)->cat_tree->inode,
> mdb-
> > drCTExtRec,
>      &mdb->drCTFlSize, NULL);
>
> /* write MDB to disk */
> mark_buffer_dirty(HFS_SB(sb)->mdb_bh);
> unlock_buffer(HFS_SB(sb)->mdb_bh);
>
> In this case, we don;'t need to overlap the locks. What do you think?
> Sounds like reasonable solution?
>
>

Frankly speaking, I don't like this approach of assigning the pointer
on bh->b_data to HFS_SB(sb)->mdb:

data = (void *)(__bh->b_data + __offset);

We manipulate by various fields of HFS_SB(sb)->mdb throughout of the
hfs_mdb_commit(). I think that we need to allocate buffer for struct
hfs_mdb *mdb during hfs_mdb_get() (and destroy in hfs_mdb_put()) and
synchronize its content with struct buffer_head *mdb_bh. Otherwise, we
never can resolve the problem of proper locking in hfs_mdb_commit().
What do you think?

Thanks,
Slava.

>
> > Since mdb points into mdb_bh->b_data, these calls seem to update
> > fields
> > in the primary MDB buffer. So I was thinking that we could keep
> > your
> > locking scheme, but move unlock_buffer(mdb_bh) slightly later,
> > after
> > these primary MDB updates and mark_buffer_dirty(mdb_bh). The
> > alternate
> > MDB copy/sync and the bitmap writeback would still happen after
> > mdb_bh
> > is unlocked, so the main deadlock fix remains the same.
> >
> > If this concern is not valid, or if the current HFS update rules
> > already
> > make your shorter window sufficient, please ignore this. I defer to
> > your
> > judgment here.
> >
> > Below is your diff with only that small adjustment.
> >
> > Thanks,
> > Yue
> >
> > diff --git a/fs/hfs/hfs_fs.h b/fs/hfs/hfs_fs.h
> > index ac0e83f77a0f..00651f42ba38 100644
> > --- a/fs/hfs/hfs_fs.h
> > +++ b/fs/hfs/hfs_fs.h
> > @@ -66,6 +66,7 @@ struct hfs_inode_info {
> > * The HFS-specific part of a Linux (struct super_block)
> > */
> > struct hfs_sb_info {
> > + struct mutex mdb_lock; /* MDB operations
> > lock */
> > struct buffer_head *mdb_bh; /* The hfs_buffer
> >    holding the
> > real
> >    superblock (aka
> > VIB
> > diff --git a/fs/hfs/mdb.c b/fs/hfs/mdb.c
> > index a97cea35ca2e..d81e3545a045 100644
> > --- a/fs/hfs/mdb.c
> > +++ b/fs/hfs/mdb.c
> > @@ -287,6 +287,7 @@ int hfs_mdb_get(struct super_block *sb)
> > void hfs_mdb_commit(struct super_block *sb)
> > {
> > struct hfs_mdb *mdb = HFS_SB(sb)->mdb;
> > + bool write_backup = false;
> >
> > if (sb_rdonly(sb))
> > return;
> > @@ -314,11 +315,17 @@ void hfs_mdb_commit(struct super_block *sb)
> > * files grow. */
> > if (test_and_clear_bit(HFS_FLG_ALT_MDB_DIRTY, &HFS_SB(sb)-
> > > flags) &&
> >     HFS_SB(sb)->alt_mdb) {
> > + write_backup = true;
> > hfs_inode_write_fork(HFS_SB(sb)->ext_tree->inode,
> > mdb->drXTExtRec,
> >      &mdb->drXTFlSize, NULL);
> > hfs_inode_write_fork(HFS_SB(sb)->cat_tree->inode,
> > mdb->drCTExtRec,
> >      &mdb->drCTFlSize, NULL);
> >
> > + mark_buffer_dirty(HFS_SB(sb)->mdb_bh);
> > + }
> > + unlock_buffer(HFS_SB(sb)->mdb_bh);
> > +
> > + if (write_backup) {
> > lock_buffer(HFS_SB(sb)->alt_mdb_bh);
> > memcpy(HFS_SB(sb)->alt_mdb, HFS_SB(sb)->mdb,
> > HFS_SECTOR_SIZE);
> > HFS_SB(sb)->alt_mdb->drAtrb |=
> > cpu_to_be16(HFS_SB_ATTRIB_UNMNT);
> > @@ -360,7 +367,6 @@ void hfs_mdb_commit(struct super_block *sb)
> > size -= len;
> > }
> > }
> > - unlock_buffer(HFS_SB(sb)->mdb_bh);
> > }
> >
> > void hfs_mdb_close(struct super_block *sb)
> > diff --git a/fs/hfs/super.c b/fs/hfs/super.c
> > index a4f2a2bfa6d3..06ab904b3a42 100644
> > --- a/fs/hfs/super.c
> > +++ b/fs/hfs/super.c
> > @@ -35,7 +35,11 @@ MODULE_LICENSE("GPL");
> > static int hfs_sync_fs(struct super_block *sb, int wait)
> > {
> > is_hfs_cnid_counts_valid(sb);
> > +
> > + mutex_lock(&HFS_SB(sb)->mdb_lock);
> > hfs_mdb_commit(sb);
> > + mutex_unlock(&HFS_SB(sb)->mdb_lock);
> > +
> > return 0;
> > }
> >
> > @@ -68,7 +72,9 @@ static void flush_mdb(struct work_struct *work)
> >
> > is_hfs_cnid_counts_valid(sb);
> >
> > + mutex_lock(&sbi->mdb_lock);
> > hfs_mdb_commit(sb);
> > + mutex_unlock(&sbi->mdb_lock);
> > }
> >
> > void hfs_mark_mdb_dirty(struct super_block *sb)
> > @@ -339,9 +345,12 @@ static int hfs_fill_super(struct super_block
> > *sb, struct fs_context *fc)
> > sb->s_op = &hfs_super_operations;
> > sb->s_xattr = hfs_xattr_handlers;
> > sb->s_flags |= SB_NODIRATIME;
> > + mutex_init(&sbi->mdb_lock);
> > mutex_init(&sbi->bitmap_lock);
> >
> > + mutex_lock(&sbi->mdb_lock);
> > res = hfs_mdb_get(sb);
> > + mutex_unlock(&sbi->mdb_lock);
> > if (res) {
> > if (!silent)
> > pr_warn("can't find a HFS filesystem on
> > dev
> > %s\n",