Re: [PATCH 3/6] fs: distinguish between user initiated freeze and kernel initiated freeze

From: Jan Kara
Date: Thu May 25 2023 - 10:14:38 EST


On Mon 22-05-23 16:42:00, Darrick J. Wong wrote:
> How about this as an alternative patch? Kernel and userspace freeze
> state are stored in s_writers; each type cannot block the other (though
> you still can't have nested kernel or userspace freezes); and the freeze
> is maintained until /both/ freeze types are dropped.
>
> AFAICT this should work for the two other usecases (quiescing pagefaults
> for fsdax pmem pre-removal; and freezing fses during suspend) besides
> online fsck for xfs.
>
> --D
>
> From: Darrick J. Wong <djwong@xxxxxxxxxx>
> Subject: fs: distinguish between user initiated freeze and kernel initiated freeze
>
> Userspace can freeze a filesystem using the FIFREEZE ioctl or by
> suspending the block device; this state persists until userspace thaws
> the filesystem with the FITHAW ioctl or resuming the block device.
> Since commit 18e9e5104fcd ("Introduce freeze_super and thaw_super for
> the fsfreeze ioctl") we only allow the first freeze command to succeed.
>
> The kernel may decide that it is necessary to freeze a filesystem for
> its own internal purposes, such as suspends in progress, filesystem fsck
> activities, or quiescing a device prior to removal. Userspace thaw
> commands must never break a kernel freeze, and kernel thaw commands
> shouldn't undo userspace's freeze command.
>
> Introduce a couple of freeze holder flags and wire it into the
> sb_writers state. One kernel and one userspace freeze are allowed to
> coexist at the same time; the filesystem will not thaw until both are
> lifted.
>
> Inspired-by: Luis Chamberlain <mcgrof@xxxxxxxxxx>
> Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>

Yes, this is exactly how I'd imagine it. Thanks for writing the patch!

I'd just note that this would need rebasing on top of Luis' patches 1 and
2. Also:

> + if (sbw->frozen == SB_FREEZE_COMPLETE) {
> + switch (who) {
> + case FREEZE_HOLDER_KERNEL:
> + if (sbw->freeze_holders & FREEZE_HOLDER_KERNEL) {
> + /*
> + * Kernel freeze already in effect; caller can
> + * try again.
> + */
> + deactivate_locked_super(sb);
> + return -EBUSY;
> + }
> + if (sbw->freeze_holders & FREEZE_HOLDER_USERSPACE) {
> + /*
> + * Share the freeze state with the userspace
> + * freeze already in effect.
> + */
> + sbw->freeze_holders |= who;
> + deactivate_locked_super(sb);
> + return 0;
> + }
> + break;
> + case FREEZE_HOLDER_USERSPACE:
> + if (sbw->freeze_holders & FREEZE_HOLDER_USERSPACE) {
> + /*
> + * Userspace freeze already in effect; tell
> + * the caller we're busy.
> + */
> + deactivate_locked_super(sb);
> + return -EBUSY;
> + }
> + if (sbw->freeze_holders & FREEZE_HOLDER_KERNEL) {
> + /*
> + * Share the freeze state with the kernel
> + * freeze already in effect.
> + */
> + sbw->freeze_holders |= who;
> + deactivate_locked_super(sb);
> + return 0;
> + }
> + break;
> + default:
> + BUG();
> + deactivate_locked_super(sb);
> + return -EINVAL;
> + }
> + }

Can't this be simplified to:

BUG_ON(who & ~(FREEZE_HOLDER_USERSPACE | FREEZE_HOLDER_KERNEL));
BUG_ON(!(!(who & FREEZE_HOLDER_USERSPACE) ^
!(who & FREEZE_HOLDER_KERNEL)));
retry:
if (sb->s_writers.freeze_holders & who)
return -EBUSY;
/* Already frozen by someone else? */
if (sb->s_writers.freeze_holders & ~who) {
sb->s_writers.freeze_holders |= who;
return 0;
}

Now the only remaining issue with the code is that the two different
holders can be attempting to freeze the filesystem at once and in that case
one of them has to wait for the other one instead of returning -EBUSY as
would happen currently. This can happen because we temporarily drop
s_umount in freeze_super() due to lock ordering issues. I think we could
do something like:

if (!sb_unfrozen(sb)) {
up_write(&sb->s_umount);
wait_var_event(&sb->s_writers.frozen,
sb_unfrozen(sb) || sb_frozen(sb));
down_write(&sb->s_umount);
goto retry;
}

and then sprinkle wake_up_var(&sb->s_writers.frozen) at appropriate places
in freeze_super().

BTW, when reading this code, I've spotted attached cleanup opportunity but
I'll queue that separately so that is JFYI.

> +#define FREEZE_HOLDER_USERSPACE (1U << 1) /* userspace froze fs */
> +#define FREEZE_HOLDER_KERNEL (1U << 2) /* kernel froze fs */

Why not start from 1U << 0? And bonus points for using BIT() macro :).

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR