Re: [RFC PATCH 4/4] vfs: add filesystem freeze/thaw callbacks for power management

From: Jan Kara
Date: Thu Mar 27 2025 - 14:21:31 EST


On Thu 27-03-25 10:06:13, James Bottomley wrote:
> Introduce a freeze function, which iterates superblocks in reverse
> order freezing filesystems. The indicator a filesystem is freezable
> is either possessing a s_bdev or a freeze_super method. So this can
> be used in efivarfs, whether the freeze is for hibernate is also
> passed in via the new FREEZE_FOR_HIBERNATE flag.
>
> Thawing is done opposite to freezing (so superblock traversal in
> regular order) and the whole thing is plumbed into power management.
> The original ksys_sync() is preserved so the whole freezing step is
> optional (if it fails we're no worse off than we are today) so it
> doesn't inhibit suspend/hibernate if there's a failure.
>
> Signed-off-by: James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx>

OK, I've seen you are setting the new FREEZE_FOR_HIBERNATE flag but I didn't
find anything using that flag. What do you plan to use it for? Does you
efivars usecase need it? I find passing down this detail about the caller
down to all filesystems a bit awkward. Isn't it possible to extract the
information "hibernate is ongoing" from PM subsystem?

> +/*
> + * Kernel freezing and thawing is only done in the power management
> + * subsystem and is thus single threaded (so we don't have to worry
> + * here about multiple calls to filesystems_freeze/thaw().
> + */
> +
> +static int freeze_flags;

Frankly, the global variable to propagate flags is pretty ugly... If we
really have to propagate some context into the iterator callback, rather do
it explicitly like iterate_supers() does it.

> +static void filesystems_freeze_callback(struct super_block *sb)
> +{
> + /* errors don't fail suspend so ignore them */
> + if (sb->s_op->freeze_super)
> + sb->s_op->freeze_super(sb, FREEZE_MAY_NEST
> + | FREEZE_HOLDER_KERNEL
> + | freeze_flags);
> + else if (sb->s_bdev)
> + freeze_super(sb, FREEZE_MAY_NEST | FREEZE_HOLDER_KERNEL
> + | freeze_flags);

Style nit - braces around above blocks would be IMHO appropriate.

> + else {
> + pr_info("Ignoring filesystem %s\n", sb->s_type->name);
> + return;
> + }
> +
> + pr_info("frozen %s, now syncing block ...", sb->s_type->name);
> + sync_blockdev(sb->s_bdev);
> + pr_info("done.");
> +}

Generally this callback is not safe because it can race with filesystem
unmount and calling ->freeze_super() after the filesystem's ->put_super()
was called may have all sorts of interesting effects (freeze_super() itself
will just bail with a warning, which is better but not great either).

The cleanest way I see how to make the iteration safe is to grab active sb
reference (like grab_super() does it) for the duration of freeze_super()
calls. Another possibility would be to grab sb->s_umount rwsem exclusively
as Luis does it in his series but that requires a bit of locking surgery
and ->freeze_super() handlers make this particularly nasty these days so I
think active sb reference is going to be nicer these days.

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR