Re: ext4 regression in v5.9-rc2 from e7bfb5c9bb3d on ro fs with overlapped bitmaps
From: Theodore Y. Ts'o
Date: Tue Oct 06 2020 - 09:35:49 EST
On Mon, Oct 05, 2020 at 10:03:06PM -0700, Josh Triplett wrote:
>
> I'm not trying to create a problem here; I'm trying to address a whole
> family of problems. I was generally under the impression that mounting
> existing root filesystems fell under the scope of the kernel<->userspace
> or kernel<->existing-system boundary, as defined by what the kernel
> accepts and existing userspace has used successfully, and that upgrading
> the kernel should work with existing userspace and systems. If there's
> some other rule that applies for filesystems, I'm not aware of that.
> (I'm also not trying to suggest that every random corner case of what
> the kernel *could* accept needs to be the format definition, but rather,
> cases that correspond to existing userspace.)
I'm not opposed to the kernel side change; it's *this time*. I'm more
interested in killing off the tool that generated the malformed file
system in the first place. As I keep pointing out, things aren't
going to go well if "e2fsck -E unshare_blocks" is applied to it. So
users who use this unofficial tool to create this file system is can
run into at least this corner case, if not others, and that will
result in, as the UI designers like to say, "a poor user experience".
We had a similar issue with Android. Many years ago, Andy Rubin was
originally quite allergic to the GPL, and had tried to promulgate the
rule, "no GPL in Android Userspace". This is why bionic is used as
libc, and this resulted in Android engineers (I think before the
Google acquisition, but I'm not 100% sure), creating an unofficial,
"unauthorized" make_ext4fs which was a BSD-licensed version of mke2fs.
Unfortuantely, it created file systems which the kernel would never
complain about, but which, 50% of the time, would result in a file
system which under some circumstances, would get corrupted (even more)
when e2fsck attempted to repair the file system. So if a user had a
bit flip caused by an eMMC hiccup, e2fsck could end up making things
worse.
Worse, make_ext4fs had over time, grown extra functionality, such as
pre-setting the SELinux xattrs, such that you couldn't just replace it
with mke2fs. It took *years* to fix the problem, and that's why
contrib/e2fsdroid exists today. We finally, a few years ago, were
able to retire make_ext4fs and replace it with the combination of
mke2fs and e2fsdroid.
So that's why I really don't like it when there are "unauthorized",
unofficial tools creating file systems out there which we are now
obliged to support. Even if it's OK as far as the kernel is
concerned, unless you're planning on forking and/or reimplementing all
of e2fsprogs, and doing so correctly, that way is going to cause
headaches for file system developers.
As far as I'm concerned, it's not just about on-disk file system
format, it's also about the official user space tools. If you create
a file system which the kernel is happy with, but which wasn't created
using the official user space tools, file systems are so full of state
and permutations of how things should be done that the opportunities
for mischief are huge.
And what's especially aggravating is when it's done for petty reasons
--- whether it's trying to sae an extra 0.0003% of storage, or because
some VP was allergic to the GPL, it's stupid stuff.
> I don't *want* to rely on what apparently turned out to be an
> undocumented bug in the kernel's validator. That's why I was trying to
> fix the issue in what seemed like the right way, by detecting the
> situation and turning off the validator. That seemed like it would fully
> address the issue. If it would help, I could also supply a tiny filesystem
> image for regression testing.
I'm OK with working around the problem, and we're lucky that it's this
simple.... this time around.
But can we *please* take your custom tool out back and shoot it in the
head? Like make_ext4fs, it's just going to cause more headaches down
the line.
And perhaps we need to make a policy that makes it clear that for file
systems, it's not just about "whatever the kernel happens to accept".
It should also be, "was it generated using an official version of the
userspace tools", at least as a consideration. Yes, we can try to
make the kernel more strict, and that's a good thing to do, but
inevitably, as we make the kernel more strict, we can potentially
break other unffocial tools out there, and it's going to make it a lot
harder to be able to do backwards compatible format enhancements to
the file system.
- Ted