Re: [PATCH 3/5 resend] VFS: Fix s_umount thaw/write deadlock

From: Jan Kara
Date: Wed Dec 07 2011 - 18:17:02 EST


On Tue 06-12-11 06:35:44, Christoph Hellwig wrote:
> On Mon, Dec 05, 2011 at 12:54:47PM -0800, Kamal Mostafa wrote:
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 73c3992..dbeaede 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -554,6 +554,14 @@ static long writeback_sb_inodes(struct super_block *sb,
> > while (!list_empty(&wb->b_io)) {
> > struct inode *inode = wb_inode(wb->b_io.prev);
> >
> > + if (inode->i_sb == sb && vfs_is_frozen(sb)) {
> > + /*
> > + * Inode is on a frozen superblock; skip it for now.
> > + */
> > + redirty_tail(inode, wb);
> > + continue;
> > + }
> > +
>
> We make sure to not dirty any new inodes after the first phase of the
> freeze, so this should be a BUG_ON/WARN_ON.
This is not really true in presence of mmaped writes. To block mmaped
writes on a frozen filesystem, we need some synchronization between
page_mkwrite() and freezing code. Currently, to avoid any additional
locking overhead, we set page dirty and *then* check for filesystem being
frozen. Only this order can make sure either the page is written (and
write-protected) or the frozen check triggers and we wait... (see the
comment in block_page_mkwrite()). The nasty sideeffect of this is that
there can be dirty pages & inodes on a frozen filesystem. We are blocked in
the page fault of these pages so user cannot write any data to these pages
but still they are marked dirty.

Alternatively we could have a different mechanism (rw semaphore?) to
synchronize page faults and freezing but I'd hate the overhead for the case
almost noone cares about...

> While we're at it: can anyway suggest a less cumbersome name than
> writeback_inodes_sb_if_idle? I'd go for
> try_to_writeback_inodes_sb(_nr).
Sounds good.

> > index 35f4b0e..47983d9 100644
> > --- a/fs/quota/quota.c
> > +++ b/fs/quota/quota.c
> > @@ -47,7 +47,7 @@ static int check_quotactl_permission(struct super_block *sb, int type, int cmd,
> >
> > static void quota_sync_one(struct super_block *sb, void *arg)
> > {
> > - if (sb->s_qcop && sb->s_qcop->quota_sync)
> > + if (sb->s_qcop && sb->s_qcop->quota_sync && !vfs_is_frozen(sb))
> > sb->s_qcop->quota_sync(sb, *(int *)arg, 1);
> > }
>
> Again, this should be a BUG_ON/WARN_ON. We shouldn't redirty quotas
> after the freeze.
You cannot really turn the check into WARN_ON because we iterate over all
superblocks and only in ->quota_sync() check whether something is dirty.
But the check seems to be unnecessary, that is correct.

> > @@ -368,8 +368,18 @@ SYSCALL_DEFINE4(quotactl, unsigned int, cmd, const char __user *, special,
> > goto out;
> > }
> >
> > + /*
> > + * It's not clear which quota functions may write to the file
> > + * system (all?). Check for a frozen fs and bail out now.
> > + */
> > + if (vfs_is_frozen(sb)) {
> > + ret = -EBUSY;
> > + goto out_drop_super;
> > + }
>
> How about spending the three minutes to figure it out?
> Q_GETFMT/Q_GETINFO/Q_XGETQSTAT and Q_GETQUOTA are the obvious read-only
> candidates.
Q_GETQUOTA can actually cause filesystem modification (reservation of
space in quota file) but the others are read-only. Also after some thought
I'd prefer that quotactl(8) just blocks to be consistent with how other
syscalls behave...

Honza
--
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/