Re: [PATCH] Devcoredump: fix use-after-free issue when releasing devcd device

From: Mukesh Ojha
Date: Tue Oct 31 2023 - 08:46:30 EST




On 10/31/2023 2:29 PM, Johannes Berg wrote:
On Tue, 2023-10-31 at 16:29 +0800, Yu Wang wrote:

In this case, the device is temporarily added for dump only, so we need to
delete it when dump is completed.
The other users doesn't add/delete the device like this.

For good reason, I guess? I think this is probably a bad idea.

The whole point of this was to actually know which device created the
coredump? If you make one up on the fly that's ... pointless? Surely you
must have _some_ device that already exists?

Passing device name to be user space looks to be the reason.

Looks like here, it is trying to do what devcoredump is already doing, like dev_coredump creates a custom device and deletes it after either
5 min or based on user space action. Same might being called from caller
of devcoredumpm().
devcd->free() should not be assumed to delete custom device
as devcd has reference to your device..and it can not be freed unless
devcoredump put reference to your device..

What is the assumption behind this comment 89-92 ?
sysfs_delete_link() is assuming the device is still there and
deleting the link..why is this needed if this is vulnerable..

82 static void devcd_dev_release(struct device *dev)
83 {
84 struct devcd_entry *devcd = dev_to_devcd(dev);
85
86 devcd->free(devcd->data);
87 module_put(devcd->owner);
88
89 /*
90 * this seems racy, but I don't see a notifier or such on
91 * a struct device to know when it goes away?
92 */
93 if (devcd->failing_dev->kobj.sd)
94 sysfs_delete_link(&devcd->failing_dev->kobj, &dev->kobj,
95 "devcoredump");
96
97 put_device(devcd->failing_dev);
98 kfree(devcd);


-Mukesh

It removes the device when the @free function has been called, I think
the @free function should be considered as a completion signal, and so
we need to put @free at the end of falling-device-related-clean-up in
devcoredump framework (the change in the patch).

That may be true for the case you have, but really, I wouldn't consider
that a bug now? >
Besides, we do hav<e put_device() at the end, and I'm not sure I see how
the device can be removed before put_device()?

Can parts of the device there disappear before we release all the
references? Could that mean the scenario is also possible without your
contorted device creation and removal, just by unplugging the device
while a coredump exists in sysfs?



Yes, I know we don't need to care about the dump data, but as mentioned
upon, the device is locally and temporarily created for this one-time dump
only, we need to free it when dump is completed, so introduce this completion.
Refer to drivers/remoteproc/remoteproc_coredump.c.

I still don't think this is right ... Even the code there is awful, it
potentially blocks for 5 minutes? I'm not sure we should encourage that.

Note that you could also argue exactly the other way around - what if
the *free* function requires access to the device? It gets arbitrary
data after all.


Regarding NULL for the struct module pointer, looks it's allowed for this
API (<remoteproc_coredump.c> also pass NULL)? But yes, it's not nice indeed,
we need this to get a reference of the calling module for safety.
Will correct in the next patch set.

Well, it's not really allowed to literally write NULL - maybe we should
have a macro that fills in THIS_MODULE. But THIS_MODULE can be NULL at
runtime if it's in built-in code ... so we can't catch it.

But actually if you do have the completion you wouldn't care about this
specific case, but ...

johannes