Re: ext4 regression in v5.9-rc2 from e7bfb5c9bb3d on ro fs with overlapped bitmaps

From: Josh Triplett
Date: Mon Oct 05 2020 - 20:32:25 EST


On Mon, Oct 05, 2020 at 10:36:39AM -0700, Darrick J. Wong wrote:
> On Mon, Oct 05, 2020 at 01:14:54AM -0700, Josh Triplett wrote:
> > Ran into an ext4 regression when testing upgrades to 5.9-rc kernels:
> >
> > Commit e7bfb5c9bb3d ("ext4: handle add_system_zone() failure in
> > ext4_setup_system_zone()") breaks mounting of read-only ext4 filesystems
> > with intentionally overlapping bitmap blocks.
> >
> > On an always-read-only filesystem explicitly marked with
> > EXT4_FEATURE_RO_COMPAT_SHARED_BLOCKS, prior to that commit, it's safe to
> > point all the block and inode bitmaps to a single block
>
> LOL, WHAT?
>
> I didn't know shared blocks applied to fs metadata. I thought that
> "shared" only applied to file extent maps being able to share physical
> blocks.

The flag isn't documented very well yet, but since it specifically
forces the filesystem to always be mounted read-only, the bitmaps really
shouldn't matter at all. (In an ideal world, a permanently read-only
filesystem should be able to omit all the bitmaps entirely. Pointing
them all to a single disk block is the next best thing.)

> Could /somebody/ please document the ondisk format changes that are
> associated with this feature?

I pretty much had to sort it out by looking at a combination of
e2fsprogs and the kernel, and a lot of experimentation, until I ended up
with something that the kernel was completely happy with without a
single complaint.

I'd be happy to write up a summary of the format.

I'd still really like to see this patch applied for 5.9, to avoid having
filesystems that an old kernel can mount but a new one can't. This still
seems like a regression to me.

> > of all 1s,
> > because a read-only filesystem will never allocate or free any blocks or
> > inodes.
>
> All 1s? So the inode bitmap says that every inode table slot is in use,
> even if the inode record itself says it isn't?

Yes.

> What does e2fsck -n
> think about that kind of metadata inconsistency?

If you set up the rest of the metadata consistently with it (for
instance, 0 free blocks and 0 free inodes), you'll only get a single
complaint, from the e2fsck equivalent of block_validity. See
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=956509 for details on
that; with that fixed, e2fsck wouldn't complain at all. The kernel,
prior to 5.9-rc2, doesn't have a single complaint, whether at mount,
unmount, or read of every file and directory on the filesystem.

The errors you got in your e2fsck run came because you just overrode the
bitmaps, but didn't make the rest of the metadata consistent with that.
I can provide a sample filesystem if that would help.

- Josh