Re: [PATCH] fix writing to the filesystem after unmount

From: Christian Brauner
Date: Thu Sep 07 2023 - 12:32:15 EST


> I think we've got too deep down into "how to fix things" but I'm not 100%

We did.

> sure what the "bug" actually is. In the initial posting Mikulas writes "the
> kernel writes to the filesystem after unmount successfully returned" - is
> that really such a big issue? Anybody else can open the device and write to
> it as well. Or even mount the device again. So userspace that relies on
> this is kind of flaky anyway (and always has been).

Yeah, agreed.

> namespaces etc. I'm not sure such behavior brings much value...

It would in any case mean complicating our code for little gain imho.
And as I showed in my initial reply the current patch would hang on any
bind-mount unmount. IOW, any container. And Al correctly points out
issues with exit(), close() and friends on top of that.

But I also hate the idea of waiting on the last umount because that can
also lead to new unexpected behavior when e.g., the system is shutdown
and systemd goes on to unmount all things and then suddenly just hangs
when before it was able to make progress.

And returning EBUSY is tricky as well as we somehow would need to have a
way to refcount in a manner that let's us differentiate between last-
"user-visible"-superblock-reference" and
last-active-superblock-reference which would complicate things even more.

I propose we clearly document that unmounting a frozen filesystem will
mean that the superblock stays active at least until the filesystem is
unfrozen.

And if userspace wants to make sure to not recycle such a frozen
superblock they can now use FSCONFIG_CMD_CREATE_EXCL to detect that.

What might be useful is to extend fanotify. Right now we have
fsnotify_sb_delete() which lets you detect that a superblock has been
destroyed (generic_shutdown_super()). It could be useful to also get
notified when a superblock is frozen and unfrozen?