Re: [PATCH] Devcoredump: fix use-after-free issue when releasing devcd device

From: Yu Wang
Date: Tue Oct 31 2023 - 05:41:52 EST




On 10/31/2023 3:39 PM, Greg KH wrote:
> On Tue, Oct 31, 2023 at 03:15:12PM +0800, Yu Wang wrote:
>>
>>
>> On 10/27/2023 7:12 PM, Greg KH wrote:
>>> On Thu, Oct 26, 2023 at 10:55:21PM -0700, Yu Wang wrote:
>>>> With sample code as below, it may hit use-after-free issue when
>>>> releasing devcd device.
>>>>
>>>> struct my_coredump_state {
>>>> struct completion dump_done;
>>>> ...
>>>> };
>>>>
>>>> static void my_coredump_free(void *data)
>>>> {
>>>> struct my_coredump_state *dump_state = data;
>>>> ...
>>>> complete(&dump_state->dump_done);
>>>> }
>>>>
>>>> static void my_dev_release(struct device *dev)
>>>> {
>>>> kfree(dev);
>>>> }
>>>>
>>>> static void my_coredump()
>>>> {
>>>> struct my_coredump_state dump_state;
>>>> struct device *new_device =
>>>> kzalloc(sizeof(*new_device), GFP_KERNEL);
>>>>
>>>> ...
>>>> new_device->release = my_dev_release;
>>>> device_initialize(new_device);
>>>> ...
>>>> device_add(new_device);
>>>> ...
>>>> init_completion(&dump_state.dump_done);
>>>> dev_coredumpm(new_device, NULL, &dump_state, datalen, GFP_KERNEL,
>>>> my_coredump_read, my_coredump_free);
>>>> wait_for_completion(&dump_state.dump_done);
>>>> device_del(new_device);
>>>> put_device(new_device);
>>>> }
>>>>
>>>> In devcoredump framework, devcd_dev_release() will be called when
>>>> releasing the devcd device, it will call the free() callback first
>>>> and try to delete the symlink in sysfs directory of the failing device.
>>>> Eventhough it has checked 'devcd->failing_dev->kobj.sd' before that,
>>>> there is no mechanism to ensure it's still available when accessing
>>>> it in kernfs_find_ns(), refer to the diagram as below:
>>>>
>>>> Thread A was waiting for 'dump_state.dump_done' at #A-1-2 after
>>>> calling dev_coredumpm().
>>>> When thread B calling devcd->free() at #B-2-1, it wakes up
>>>> thread A from point #A-1-2, which will call device_del() to
>>>> delete the device.
>>>> If #B-2-2 comes before #A-3-1, but #B-4 comes after #A-4, it
>>>> will hit use-after-free issue when trying to access
>>>> 'devcd->failing_dev->kobj.sd'.
>>>>
>>>> #A-1-1: dev_coredumpm()
>>>> #A-1-2: wait_for_completion(&dump_state.dump_done)
>>>> #A-1-3: device_del()
>>>> #A-2: kobject_del()
>>>> #A-3-1: sysfs_remove_dir() --> set kobj->sd=NULL
>>>> #A-3-2: kernfs_put()
>>>> #A-4: kmem_cache_free() --> free kobj->sd
>>>>
>>>> #B-1: devcd_dev_release()
>>>> #B-2-1: devcd->free(devcd->data)
>>>> #B-2-2: check devcd->failing_dev->kobj.sd
>>>> #B-2-3: sysfs_delete_link()
>>>> #B-3: kernfs_remove_by_name_ns()
>>>> #B-4: kernfs_find_ns() --> access devcd->failing_dev->kobj.sd
>>>>
>>>> To fix this issue, put operations on devcd->failing_dev before
>>>> calling the free() callback in devcd_dev_release().
>>>>
>>>> Signed-off-by: Yu Wang <quic_yyuwang@xxxxxxxxxxx>
>>>> ---
>>>> drivers/base/devcoredump.c | 5 ++---
>>>> 1 file changed, 2 insertions(+), 3 deletions(-)
>>>
>>> Also, what commit id does this fix?
>>
>> Thanks for your comment :)
>> Do you mean the commit which introduced this issue? It's from initial version of devcoredump.c.
>
> Ok, but then what in-kernel code has the above pattern to cause this
> "problem"? Why not fix that up?
>
We use this API as below:
<Create a device> -> <submit dump on it and wait for completion> -> <Remove the device>.

The difference with the in-kernel code is that the time between <submit dump on it and wait for completion>
and <remove the device> is very short and causes race between sysfs_delete_link() and device_del().
I think devcoredump framework should also cover this case.

> thanks,
>
> greg k-h