Re: linux-next: build failure after merge of the vfs tree
From: Al Viro
Date: Tue Jan 03 2012 - 09:45:52 EST
On Tue, Jan 03, 2012 at 02:39:42PM +0100, Jan Kara wrote:
> Thanks Stephen! Al, how shall we resolve this? You wrote you can provide
> a VFS helper like get_super() which will also guarantee that the fs is
> unfrozen. That could be used in quotactl_block() and fsync_bdev(). If you
> plan to do this for 3.3 then I can just remove the quota fix and let you
> do it.
I started digging in that area and I really don't like what I'm seeing.
sget() race fix from Aug 2010 (MS_BORN one) had not covered all cases.
The thing is, we can get hit with this:
1) mount(2) does sget(), etc. and fails very late in the game - with
->s_root already allocated. For some filesystems such failure exits are
possible.
2) something crawling through the superblock list finds our new
sb before we realize it's doomed. Tries to grab s_umount, gets blocked.
3) in the meanwhile *another* mount(2) does sget() that catches
the same sb and decides to pick it. ->s_active is grabbed, we get blocked
on attempt to get ->s_umount exclusive.
4) the original mount(2) gets to the failure point and does
deactivate_locked_super(). ->s_active is decremented, ->s_umount unlocked.
However, because of (3) ->s_active does not reach 0 yet. Guy stuck in (2)
gets to run. ->s_root is non-NULL here. And fs is not in a good shape...
5) sget() from (3) gets to ->s_umount, notices that MS_BORN hadn't
been set and does deactivate_locked_super(). Now ->s_active is 0 and
we get around to shutting the sucker down. ->kill_sb() gets called, ->s_root
is dropped, etc. - the whole nine yards. Caller of sget() had been saved from
the race. However, whoever that had been in (2) and (4) still got hit.
IOW, MS_BORN check is needed in the places that go through the superblock
list, grab ->s_umount and check ->s_root. That will close the hole for
good.
We also have a problem in get_active_super() caller; again, the missing MS_BORN
check (in freeze_super(), after getting ->s_umount).
I went through the ->mount() instances; most of them can't fail with non-NULL
->s_root at all or, if they do, leave the superblock in basically usable
shape. However, some might be b0rken; among other things, ext4 and minixfs
*definitely* can leak root dentry on late failure exits. Still doing RTFS...
Another fun question - can ->statfs() ever wait for fs to be thawed? If so,
we have another problem like the one spotted by Mikulas - in ustat(2). And
if not, we'd damn better document that requirement.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/