Re: ext4 regression in v5.9-rc2 from e7bfb5c9bb3d on ro fs with overlapped bitmaps

From: Josh Triplett
Date: Tue Oct 06 2020 - 01:04:47 EST


On Mon, Oct 05, 2020 at 11:18:34PM -0400, Theodore Y. Ts'o wrote:
> What Josh is proposing I'm pretty sure would also break "e2fsck -E
> unshare_blocks", so that's another reason not to accept this as a
> valid format change.

The kernel already accepted this as a valid mountable filesystem format,
without a single error or warning of any kind, and has done so stably
for years.

> As far as I'm concerned, contrib/e2fsdroid is the canonical definition
> of how to create valid file systems with shared_blocks.

I'm not trying to create a problem here; I'm trying to address a whole
family of problems. I was generally under the impression that mounting
existing root filesystems fell under the scope of the kernel<->userspace
or kernel<->existing-system boundary, as defined by what the kernel
accepts and existing userspace has used successfully, and that upgrading
the kernel should work with existing userspace and systems. If there's
some other rule that applies for filesystems, I'm not aware of that.
(I'm also not trying to suggest that every random corner case of what
the kernel *could* accept needs to be the format definition, but rather,
cases that correspond to existing userspace.)

It wouldn't be *impossible* to work around this, this time; it may be
possible to adapt the existing userspace to work on the new and old
kernels. My concern is, if a filesystem format accepted by previous
kernels can be rejected by future kernels, what stops a future kernel
from further changing the format definition or its strictness
(co-evolving with one specific userspace) and causing further
regressions?

I don't *want* to rely on what apparently turned out to be an
undocumented bug in the kernel's validator. That's why I was trying to
fix the issue in what seemed like the right way, by detecting the
situation and turning off the validator. That seemed like it would fully
address the issue. If it would help, I could also supply a tiny filesystem
image for regression testing.

I'm trying to figure out what solution you'd like to see here, as long
as it isn't "any userspace that isn't e2fsdroid can be broken at will".
I'd be willing to work to adapt the userspace bits I have to work around
the regression, but I'd like to get this on the radar so this doesn't
happen again.

- Josh Triplett