Re: [PATCH] fs/block_dev.c: return the right error in thaw_bdev()
From: Mateusz Guzik
Date: Wed Oct 05 2016 - 02:48:01 EST
On Tue, Oct 04, 2016 at 11:06:15AM +0200, Jan Kara wrote:
> On Tue 04-10-16 10:53:40, Pierre Morel wrote:
> > When triggering thaw-filesystems via magic sysrq, the system enters a
> > loop in do_thaw_one(), as thaw_bdev() still returns success if
> > bd_fsfreeze_count == 0. To fix this, let thaw_bdev() always return
> > error (and simplify the code a bit at the same time).
> >
>
> The patch looks good.
>
Now that I had a closer look, while the patch indeed gets rid of the
infinite loop, the functionality itself does not work properly.
Note I'm not familiar with this code, so chances are I got details
wrong. I also don't know the original reasoning.
The current state is that you can freeze by calling either freeze_super
or freeze_bdev. The latter bumps bd_fsfreeze_count and calls
freeze_super. freeze_super does NOT modify bd_fsfreeze_count.
freeze_bdev is used by device mapper, xfs and e2fs.
freeze_super is used by the FIFREEZE ioctl.
This disparity leads to breakage.
Quick look suggests device freezing has its users and sb is optionally
present. So the fix would consist on making all freezers call the bdev
variant.
The j sysrq ends up not working in 2 ways.
1. It ends up calling do_thaw_one -> thaw_bdev. It will look at the
bd_fsfreeze_count counter and error out. But if the user froze the fs
with the ioctl, the counter is not modified and the fs in question ends
up not being thawed.
2. (assuming 1 is fixed) Since freezing through freeze_bdev supports
nesting, multiple invocations of the sysrq may be needed to actually
thaw.
Now, looking at *_bdev functions:
> struct super_block *freeze_bdev(struct block_device *bdev)
> {
> struct super_block *sb;
> int error = 0;
>
> mutex_lock(&bdev->bd_fsfreeze_mutex);
> if (++bdev->bd_fsfreeze_count > 1) {
No limit is put in place so in principle this will eventually turn negative.
> /*
> * We don't even need to grab a reference - the first call
> * to freeze_bdev grab an active reference and only the last
> * thaw_bdev drops it.
> */
> sb = get_super(bdev);
> if (sb)
> drop_super(sb);
> mutex_unlock(&bdev->bd_fsfreeze_mutex);
The code assumes nobody thaws the fs from under the consumer.
That is, code doing sb = freeze_bdev(b); ...; thaw_bdev(b, sb); risks
use-after-free if the user thawed the fs.
I did not check if current consumers hold the object in different ways
and thus avoiding the bug.
> return sb;
> }
>
> sb = get_active_super(bdev);
> if (!sb)
> goto out;
> if (sb->s_op->freeze_super)
> error = sb->s_op->freeze_super(sb);
> else
> error = freeze_super(sb);
This differs from the ioctl version, which has this check:
> if (sb->s_op->freeze_fs == NULL && sb->s_op->freeze_super == NULL)
> return -EOPNOTSUPP;
For a filesystem with freeze_fs == NULL, freeze_super will return 0 and
freeze_bdev will end up returnig 0.
I don't know if this turns into a real problem.
> int thaw_bdev(struct block_device *bdev, struct super_block *sb)
[snip]
> if (sb->s_op->thaw_super)
> error = sb->s_op->thaw_super(sb);
> else
> error = thaw_super(sb);
> if (error) {
> bdev->bd_fsfreeze_count++;
> mutex_unlock(&bdev->bd_fsfreeze_mutex);
> return error;
> }
The fact that someone else can thaw makes it very fishy to pass super
blocks around in the first place. In particular:
kernel freeze -> user thaw -> mount -> user freeze -> kernel thaw
Are we guaranteed that the sb returned for the kernel freeze is the same
which was used for user freeze?
That said, rough proposed fix is as follows:
- store the pointer to the sb used for freezing in bdev
- drop the sb argument from thaw_bdev
- optionally return a referenced sb from freeze_bdev, otherwise just
return NULL on success
- convert all freeze consumers to use *_bdev variants
- somewhat cosmetic, introduce nesting limit for freezes (say 2048 or
whatever, just to have something)
The ioctl used use freeze_bdev but this was changed with 18e9e5104fcd
"Introduce freeze_super and thaw_super for the fsfreeze ioctl" which
cites btrfs as the reason for freeze_super as opposed to freeze_bdev.
CC'ing the author for comments.
--
Mateusz Guzik