Re: [PATCH 03/19] fs: release anon dev_t in deactivate_locked_super
From: Al Viro
Date: Mon Oct 09 2023 - 17:58:23 EST
On Mon, Oct 02, 2023 at 08:46:46AM +0200, Christoph Hellwig wrote:
> On Tue, Sep 26, 2023 at 10:25:15PM +0100, Al Viro wrote:
> > Before your patch: foo_kill_super() calls kill_anon_super(),
> > which calls kill_super_notify(), which removes the sucker from
> > the list, then frees ->s_fs_info. After your patch:
> > removal from the lists happens via the call of kill_super_notify()
> > *after* both of your methods had been called, while freeing
> > ->s_fs_info happens from the method call. IOW, you've restored
> > the situation prior to "super: ensure valid info". The whole
> > point of that commit had been to make sure that we have nothing
> > in the lists with ->s_fs_info pointing to a freed object.
> >
> > It's not about free_anon_bdev(); that part is fine - it's the
> > "we can drop the weird second call site of kill_super_notify()"
> > thing that is broken.
>
> The point has been to only release the anon dev_t after
> kill_super_notify, to prevent two of them beeing reused.
>
> Which we do as the free_anon_bdev is done directly in
> deactivate_locked_super. The new ->free_sb for non-block file systems
> frees resources, but none of them matter for sget.
We keep talking past each other... Let me try again:
at the tip of your branch you have
static struct file_system_type ubifs_fs_type = {
.name = "ubifs",
.owner = THIS_MODULE,
.mount = ubifs_mount,
.free_sb = ubifs_free_sb,
};
static void ubifs_free_sb(struct super_block *s)
{
kfree(s->s_fs_info);
}
static struct dentry *ubifs_mount(struct file_system_type *fs_type, int flags,
const char *name, void *data)
{
...
sb = sget(fs_type, sb_test, sb_set, flags, c);
...
}
static int sb_test(struct super_block *sb, void *data)
{
struct ubifs_info *c1 = data;
struct ubifs_info *c = sb->s_fs_info;
return c->vi.cdev == c1->vi.cdev;
}
See the problem? Mainline has
static void kill_ubifs_super(struct super_block *s)
{
struct ubifs_info *c = s->s_fs_info;
kill_anon_super(s);
kfree(c);
}
and
void kill_anon_super(struct super_block *sb)
{
dev_t dev = sb->s_dev;
generic_shutdown_super(sb);
kill_super_notify(sb);
free_anon_bdev(dev);
}
That removes the superblock from the list of instances before its
->s_fs_info is freed. In your branch removal happens here:
if (fs->shutdown_sb)
fs->shutdown_sb(s);
generic_shutdown_super(s);
if (fs->free_sb)
fs->free_sb(s);
kill_super_notify(s);
That comes *after* ubifs_free_sb() has freed ->s_fs_info. And there's
nothing to stop ubifs_mount() (on a completely unrelated device) to get
called right at that moment. Doing the sget() call quoted above. Now,
in sget() we have
hlist_for_each_entry(old, &type->fs_supers, s_instances) {
if (!test(old, data))
and that will hit sb_test(old, data), with old being a superblock still
in ->fs_supers, but with ->s_fs_info already freed. So in sb_test()
we have c equal to old->s_fs_info and
return c->vi.cdev == c1->vi.cdev;
is a bloody use after free.
Here we are unlikely to get fucked over - it's a plain fetch from freed
object. If you look at e.g. nfs, you'll see a lot more than that -
pointer chasing from freed (and possibly reused) object. The only
difference is that there you have sget_fc() instead of sget() - same
loop anyway.
The bottom line: in the form it is posted, your series reintroduces the
class of UAF that had been added by taking removal from the instances
list out of generic_shutdown_super() and then papered over by adding
that kill_super_notify() into kill_anon_super().
And frankly, I believe that the root cause is the insistence that
list removal should happen after generic_shutdown_super(). Sure, you
want the superblock to serve as bdev holder, which leads to fun
with -EBUSY if mount comes while umount still hadn't closed the
device. I suspect that it would make a lot more sense to
introduce an intermediate state - "held, but will be released
in a short while". You already have something similar, but
only for the entire disk ->bd_claiming stuff.
Add a new primitive (will_release_bdev()), so that attempts to
claim the sucker will wait until it gets released instead of
failing with -EBUSY. And do *that* before generic_shutdown_super()
when unmounting something that is block-based. Allows to bring
the list removal back where it used to be, no UAF at all...
IMO that direction is a lot more promising.