Re: [PATCH] Devcoredump: fix use-after-free issue when releasing devcd device

From: Johannes Berg
Date: Tue Oct 31 2023 - 04:59:57 EST


On Tue, 2023-10-31 at 16:29 +0800, Yu Wang wrote:
>
> In this case, the device is temporarily added for dump only, so we need to
> delete it when dump is completed.
> The other users doesn't add/delete the device like this.

For good reason, I guess? I think this is probably a bad idea.

The whole point of this was to actually know which device created the
coredump? If you make one up on the fly that's ... pointless? Surely you
must have _some_ device that already exists?

> It removes the device when the @free function has been called, I think
> the @free function should be considered as a completion signal, and so
> we need to put @free at the end of falling-device-related-clean-up in
> devcoredump framework (the change in the patch).

That may be true for the case you have, but really, I wouldn't consider
that a bug now?

Besides, we do hav<e put_device() at the end, and I'm not sure I see how
the device can be removed before put_device()?

Can parts of the device there disappear before we release all the
references? Could that mean the scenario is also possible without your
contorted device creation and removal, just by unplugging the device
while a coredump exists in sysfs?

> Yes, I know we don't need to care about the dump data, but as mentioned
> upon, the device is locally and temporarily created for this one-time dump
> only, we need to free it when dump is completed, so introduce this completion.
> Refer to drivers/remoteproc/remoteproc_coredump.c.

I still don't think this is right ... Even the code there is awful, it
potentially blocks for 5 minutes? I'm not sure we should encourage that.

Note that you could also argue exactly the other way around - what if
the *free* function requires access to the device? It gets arbitrary
data after all.


> Regarding NULL for the struct module pointer, looks it's allowed for this
> API (<remoteproc_coredump.c> also pass NULL)? But yes, it's not nice indeed,
> we need this to get a reference of the calling module for safety.
> Will correct in the next patch set.

Well, it's not really allowed to literally write NULL - maybe we should
have a macro that fills in THIS_MODULE. But THIS_MODULE can be NULL at
runtime if it's in built-in code ... so we can't catch it.

But actually if you do have the completion you wouldn't care about this
specific case, but ...

johannes