Re: [PATCH] fix writing to the filesystem after unmount

From: Jan Kara
Date: Fri Sep 08 2023 - 06:22:27 EST


On Fri 08-09-23 11:29:40, Zdenek Kabelac wrote:
> Dne 08. 09. 23 v 9:32 Jan Kara napsal(a):
> > On Thu 07-09-23 14:04:51, Mikulas Patocka wrote:
> > >
> > > On Thu, 7 Sep 2023, Christian Brauner wrote:
> > >
> > > > > I think we've got too deep down into "how to fix things" but I'm not 100%
> > > > We did.
> > > >
> > > > > sure what the "bug" actually is. In the initial posting Mikulas writes "the
> > > > > kernel writes to the filesystem after unmount successfully returned" - is
> > > > > that really such a big issue?
> > > I think it's an issue if the administrator writes a script that unmounts a
> > > filesystem and then copies the underyling block device somewhere. Or a
> > > script that unmounts a filesystem and runs fsck afterwards. Or a script
> > > that unmounts a filesystem and runs mkfs on the same block device.
> > Well, e.g. e2fsprogs use O_EXCL open so they will detect that the filesystem
> > hasn't been unmounted properly and complain. Which is exactly what should
> > IMHO happen.
> >
> > > > > Anybody else can open the device and write to it as well. Or even
> > > > > mount the device again. So userspace that relies on this is kind of
> > > > > flaky anyway (and always has been).
> > > It's admin's responsibility to make sure that the filesystem is not
> > > mounted multiple times when he touches the underlying block device after
> > > unmount.
> > What I wanted to suggest is that we should provide means how to make sure
> > block device is not being modified and educate admins and tool authors
> > about them. Because just doing "umount /dev/sda1" and thinking this means
> > that /dev/sda1 is unused now simply is not enough in today's world for
> > multiple reasons and we cannot solve it just in the kernel.
> >
>
> /me just wondering how do you then imagine i.e. safe removal of USB drive
> when user shall not expect unmount really unmounts filesystem?

Well, currently you click some "Eject / safely remove / whatever" button
and then you get a "wait" dialog until everything is done after which
you're told the stick is safe to remove. What I imagine is that the "wait"
dialog needs to be there while there are any (or exclusive at minimum) openers
of the device. Not until umount(2) syscall has returned. And yes, the
kernel doesn't quite make that easy - the best you can currently probably
do is to try opening the device with O_EXCL and if that fails, you know
there's some other exclusive open.

> IMHO  - unmount should detect some very suspicious state of block device if
> it cannot correctly proceed - i.e. reporting 'warning/error' on such
> commands...

You seem to be concentrated too much on the simple case of a desktop with
an USB stick you just copy data to & from. :) The trouble is, as Al wrote
elsewhere in this thread that filesystem unmount can be for example a
result of exit(2) or close(2) system call if you setup things in a nasty
way. Do you want exit(2) to fail because the block device is frozen?
Umount(2) has to work for all its users and changing the behavior has nasty
corner-cases. So does the current behavior, I agree, but improving
situation for one usecase while breaking another usecase isn't really a way
forward...

> Main problem is - if the 'unmount' is successful in this case - the last
> connection userspace had to this fileystem is lost - and user cannot get rid
> of such filesystem anymore for a system.

Well, the filesystem (struct superblock to be exact) is invisible in
/proc/mounts (or whatever), that is true. But it is still very much
associated with that block device and if you do 'mount <device>
<mntpoint>', you'll get it back. But yes, the filesystem will not go away
until all references to it are dropped and you cannot easily find who holds
those references and how to get rid of them.

> I'd likely propose in this particular state of unmounting of a frozen
> filesystem to just proceed - and drop the frozen state together with release
> filesystem and never issue any ioctl from such filelsystem to the device
> below - so it would not be a 100% valid unmount - but since the freeze
> should be nearly equivalent of having a proper 'unmount' being done -  it
> shoudn't be causing any harm either - and  all resources associated could 
> be 'released.  IMHO it's correct to 'drop' frozen state for filesystem
> that is not going to exist anymore  (assuming it's the last  such user)

This option was also discussed in the past and it has nasty consequences as
well. Cleanly shutting down a filesystem usually needs to write to the
underlying device so either you allow the filesystem to write to the device
on umount breaking assumptions of the user who froze the fs or you'd have
to implement a special handling for this case for every filesystem to avoid
the writes (and put up with the fact that the filesystem will appear as
uncleanly shutdown on the next mount). Not particularly nice either...

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR