Re: [PATCH] kernfs: fix use-after-free in __kernfs_remove

From: Tejun Heo
Date: Mon Sep 12 2022 - 17:39:09 EST


Hello,

On Mon, Sep 12, 2022 at 11:24:52PM +0200, Christian A. Ehrhardt wrote:
> Sorry, no patch (yet). But here's the whole story of the initial
> syzkaller report (form top to bottom). I'm not sure where we go wrong
> but I think several places could do better here:
>
> The code in net/9p/client.c creates one kmem-cache per client session.
> All of these kmem caches use the same name ("9p-fcall-cache").
> Is it ok to create several kmem caches with the same name? My feeling is
> that this is somewhat unexpected but should probably be allowed.

I don't think that's supported. kmem_caches are exposed in sysfs and having
the same name is gonna cause issues.

> In the setup in question slab caches are not merged. Thus the slub
> code uses the kmem cache name directly to create the sysfs directory for
> the slub cache. If we allow multiple kmem caches with the same name the
> slub code should somehow make the sysfs names unique (e.g. by adding a
> serial numer to the name), right?

I think we're just in uncharted terriotory. Maybe some things will work
while others don't. Nobody really thought about or tested this usage.

> Before creating the sysfs directory the slub code uses sysfs_remove_link
> (aka kernfs_remove_by_name) with the following comment:
> "If we have a leftover link remove it." In fact these "leftover"s
> are the sysfs files of an active kmem cache with the same name.

Hahahaha

> Additionally, sysfs_remove_link() looks like a bit of a misnomer
> as it removes whatever exists under the given name. This may be a
> symlink but it can be an entire directory hierarchy, too.
> Is this intentional? At least it's been like that forever.

It's a historical artifact. The backend implementation has changed while the
wrapping sysfs interface remained the same.

> If kmem cache creation is done in parallel we can now have
> concurrent invocations of kernfs_remove_by_name_ns() for the same
> parent and the same name. This is what eventually causes the race.
>
> The race is possible as kernfs_remove_by_name_ns() looks up the
> name of the target in its parent but does not acquire a ref count
> on the target before calling __kernfs_remove(). __kernfs_remove()
> may drop the kernfs_rwsem in kernfs_drain(). Thus a concurrent call
> to __kernfs_remove can complete the removal except for the nodes
> that the first instance of __kernfs_remove() holds refs for.
> As explained above no ref is held on the root of the removed tree.
> This causes the use-after-free that KASAN sees and complains about.
>
> For kernfs_remove it is reasonable to expect the caller to hold
> some kind of reference to prevent this type of race and from a quick
> check, the callers seem to get this correct. The only exception that
> I could find is kernfs_remove_by_name_ns. This is easy to fix if
> kernfs_remove_by_name_ns() hold a reference on the root node across
> the call to __kernfs_remove().
> IMHO this should be done. Should I sent a patch?

So, no matter what, I think it'd be a good idea to make remove_by_name hold
onto the kn it's removing, so please send a patch to do so. For the rest of
the situation, I think the right thing to do would be updating 9p so that it
doesn't create multiple kmem_caches with the same name.

Thanks.

--
tejun