Re: Possible race at user mode helper in request_firmware
From: Ming Lei
Date: Wed Mar 16 2016 - 04:15:42 EST
Hi Chandra,
On Wed, Mar 16, 2016 at 3:02 PM, Chandra Sekhar Lingutla
<clingutla@xxxxxxxxxxxxxx> wrote:
> Hi Ming,
>
>
> On 03/16/2016 07:36 AM, Ming Lei wrote:
>>
>> On Tue, Mar 15, 2016 at 11:09 PM, Ming Lei <ming.lei@xxxxxxxxxxxxx> wrote:
>>>
>>> Hi,
>>>
>>> On Mon, Mar 14, 2016 at 8:00 PM, <clingutla@xxxxxxxxxxxxxx> wrote:
>>>>
>>>> Hello,
>>>>
>>>> I see possible race in _request_firmware_load function, on which I
>>>> wanted to
>>>> take your opinion.
>>>>
>>>> When system is going to low power mode, device_cache_fw_images() is
>>>> called
>>>> from pm notifier which schedules asynchronous workers to cache devices
>>>> firmware.
>>>> If more than 2 async requests falls under user helper mode, then there
>>>> is a
>>>> use after free of "firmware" directory kobject.
>>>
>>>
>>> If two requests are for getting same firmware image, only one can be sent
>>> out and be completed with user helper, please see
>>> fw_lookup_and_allocate_buf(),
>>> so the race shouldn't have been triggered in this case.
>>>
>>> So I guess your test must be involved with at least two different
>>> requests.
>>>
>>>>
>>>> The race will hit, when one asynchronous worker calls device_del() while
>>>> other asynchronous worker calls get_device_parent() in
>>>> _request_firmware_load().
>>>> After loading firmware, first worker calls device_del(), and "firmware"
>>>> directory ref count is 1, so cleanup_glue_dir() calls kobject_put() with
>>>> gdp_mutex lock held.
>>>> Meanwhile, second async worker calls device_add(), and
>>>> get_device_parent()
>>>> is still able to find the firmware directory kobject in ksets after
>>>> getting
>>>> gdp_mutex.
>>>> This kobject gets added as parent to the second device kobject.
>>>> By the time of accessing the parent kobject, its ref count was 0 and
>>>> prints
>>>> below warning stack followed by crash.
>>>
>>>
>>> The trick here is just that we don't use 'struct device' in
>>> device_cache_fw_images(),
>>> then the firmware device's parent is null during calling device_add()
>>> from
>>> _request_firmware_load(), but this should be allowed by driver core.
>>>
>>> From firmware class view, both device_add()/device_del() is run with
>>> pair,
>>> and driver core still complains during get_device_parent(), so something
>>> inside driver core should be be wrong, IMO.
>>
>>
>> Thinking about the issue further, I believe it is a false positive which
>> is
>> caused by CONFIG_DEBUG_KOBJECT_RELEASE, follows the story:
>>
>> 1) device_add() path
>>
>> get_device_parent()
>> ......
>> mutex_lock(&gdp_mutex);
>>
>> /* find our class-directory at the parent and reference
>> it */
>> spin_lock(&dev->class->p->glue_dirs.list_lock);
>> list_for_each_entry(k, &dev->class->p->glue_dirs.list,
>> entry)
>> if (k->parent == parent_kobj) {
>> kobj = kobject_get(k);
>> break;
>> }
>> spin_unlock(&dev->class->p->glue_dirs.list_lock);
>> if (kobj) {
>> mutex_unlock(&gdp_mutex);
>> return kobj;
>> }
>> k = class_dir_create_and_add(dev->class, parent_kobj);
>> mutex_unlock(&gdp_mutex);
>> ......
>>
>> 2) device_del() path
>>
>> cleanup_device_parent()
>> ->cleanup_glue_dir()
>> ->mutex_lock(&gdp_mutex);
>> ->kobject_put(glue_dir);
>> ->kobject_cleanup()
>> ->kobject_del()
>> ->kobj_kset_leave()
>> ->mutex_unlock(&gdp_mutex);
>>
>> So gdp_mutex is held wrt. both looking up/adding the parent kobject
>> and removing the parent kobject. Actually with this lock, both glur_dir
>> kobject's referece counting and joining/leaving kset are run atomically.
>>
>> The race you found is triggered by CONFIG_DEBUG_KOBJECT_RELEASE,
>> which causes kobject_cleanup() to be run in workqueue context and the
>> lock of gdp_mutex can't be hold in that situation anymore. That is the
>> root
>> cause for your race.
>>
>> That also said DEBUG_KOBJECT_RELEASE might cause trouble in
>> some situations.
>>
>> Thanks,
>> Ming
>>
> I agree that, CONFIG_DEBUG_KOBJECT_RELEASE can cause the race, as it delays
> call to kobject_cleanup() which deletes kobj entry in glue_dir list.
>
> I see the race even after disabling CONFIG_DEBUG_KOBJECT_RELEASE.
>
> The race is between kobject_put() and kobject_get() for the "firmware"
> directory.
Could you provide the kernel warning or sort of log about the race by
disabling CONFIG_DEBUG_KOBJECT_RELEASE?
>
> In device_del() path, I see the race after the call of
> cleanup_device_parent().
> cleanup_device_parent() calls kobject_put(glue_dir) with gdp_mutex lock
> holding,
> So ideally, to call kobject_put/kobject_get(glue_dir) we need to acquire
> gdp_mutex.
I don't think so, it is the responsibility of kobject_get() caller to make sure
the kobject isn't released yet.
>
> but kobject_del(kobj) will not hold the gdp_mutex lock while calling
> kobject_put(kobj->parent).
>
> Here is the race:
>
> Consider 2 async works calls request_firmware() for 2 different binaries
> (fw1.bin, fw2.bin)
> Async work1 got scheduled first, and it falls in user helper mode so it
> creates "firmware" dir
> and then creates "fw1.bin" directory to load data from user.
> After load completion or timeout, it starts cleanup dir's, meantime async
> work2 got scheduled and
> it also falls in user helper mode, and story starts here:
>
> PATH1 req fw for "fw1.bin" PATH2
> req fw for "fw2.bin"
> _request_firmware_load()
> device_del(dev1)
>
> ->cleanup_device_parent()
>
> mutex_lock(&gdp_mutex);
> _request_firmware_load("fw2.bin")
> kobject_put(kobj->parent); --> firmware ref ->1
> ->device_add(dev2)
> mutex_unlock(&gdp_mutex);
> ->get_device_parent()
> ->kobject_del()
> mutex_lock(&gdp_mutex)
> kobj_kset_leave(kobj); -->del "fw1.bin" kobj in kset
> list_for_each_entry(k, &dev->class->p->glue_dirs.list, entry)
> kobject_put(kobj->parent); /* No lock :)*/
> if (k->parent == parent_kobj) {
>
> ->kref_put() --> firmware ref ->0
> kobj = kobject_get(k); -> k is "firmware"
> dir
> ->kboject_release()
> break;
>
> ->kobject_del()
> }
>
> /*here deletes firmware entry in kset*/
> mutex_unlock(&gdp_mutex)
>
> ->dev2->kobj.parent = kobj -> kobj is released already
Sorry, I can't understand the above because it is shown as
mess in my email editor.
>
>
>
> PATH 1 is started removing the kobject of "fw1.bin" and its parent
> "firmware"
> .
> cleanup_device_parent(dev1) makes firmware ref count to 1, by calling
> kobject_put(firmware) with gdp_mutex held.
>
> and kobject_del(dev1->kobj) will again calls kobject_put(kobj->parent)
> without holding gdp_mutex.
>
> In PATH2, glue_dir is able to find "firmware" kobj, as firmware dir kobj is
> deleted from the glue_dir list at kboject_release().
The glue_dir can't be found after it is deleted from the other path, can it?
The gdp_mutex is used to avoid the race. If glue_dir isn't deleted, it will
be get with increasing the refer counter, so it won't be freed from another
path. If glue_dir is deleted from aother path, current path can't found
in get_device_parent() at all, so a new glue_dir will be created and added.
So I still don't see your race, maybe I am wrong?
> and it calls kobject_get(firmware), by the time PATH1 calls
> kobject_release(firmware).
>
> So PATH2 got reference of freed firmware kobj and added as parent to
> "fw2.bin" kobj.
>
> I think, here is possible fix for this race:
>
> ------------------------8<-------------------------------------
> Signed-off-by: Lingutla Chandrasekhar <clingutla@xxxxxxxxxxxxxx>
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 5d8b713..47bc082 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -1224,7 +1224,13 @@ void device_del(struct device *dev)
> BUS_NOTIFY_REMOVED_DEVICE,
> dev);
> kobject_uevent(&dev->kobj, KOBJ_REMOVE);
> cleanup_device_parent(dev);
> - kobject_del(&dev->kobj);
> +
> + if (dev->kobj.parent->kset == &dev->class->p->glue_dirs) {
> + mutex_lock(&gdp_mutex);
> + kobject_del(&dev->kobj);
> + mutex_unlock(&gdp_mutex);
> + } else
> + kobject_del(&dev->kobj);
> put_device(parent);
> }
> EXPORT_SYMBOL_GPL(device_del);
>
> --------------------------->8-----------------------------------
>
>
>>>
>>>>
>>>> [21147.455038] ------------[ cut here ]------------
>>>> [21147.458910] WARNING: at <...>/kernel/include/linux/kref.h:47
>>>> kobject_get+0x50/0x68()
>>>> [21147.481199] Modules linked in: core_ctl(PO) qdrbg_module(O)
>>>> qcrypto_module(O)
>>>> [21147.481256] CPU: 2 PID: 23935 Comm: kworker/u16:8 Tainted: P W O
>>>> 3.10.84-g0957845-00427-g56a05c2 #1
>>>> [21147.481284] Workqueue: events_unbound async_run_entry_fn
>>>> [21147.481296] Call trace:
>>>> [21147.481315] [<ffffffc000206b60>] dump_backtrace+0x0/0x270
>>>> [21147.481331] [<ffffffc000206de0>] show_stack+0x10/0x1c
>>>> [21147.481353] [<ffffffc000d569ec>] dump_stack+0x1c/0x28
>>>> [21147.481370] [<ffffffc00021cc18>] warn_slowpath_common+0x74/0x9c
>>>> [21147.481383] [<ffffffc00021ce4c>] warn_slowpath_null+0x14/0x20
>>>> [21147.481397] [<ffffffc000456b10>] kobject_get+0x4c/0x68
>>>> [21147.481416] [<ffffffc000609f64>] get_device_parent+0xa0/0x194
>>>> [21147.481430] [<ffffffc00060a3f4>] device_add+0x100/0x600
>>>> [21147.481446] [<ffffffc00061c390>] _request_firmware+0x8b4/0xa80
>>>> [21147.481459] [<ffffffc00061c58c>] request_firmware+0x30/0x3c
>>>> [21147.481473] [<ffffffc00061c5ec>] cache_firmware+0x54/0xb0
>>>> [21147.481490] [<ffffffc00061c65c>] __async_dev_cache_fw_image+0x14/0x54
>>>> [21147.481505] [<ffffffc000243794>] async_run_entry_fn+0x6c/0x12c
>>>> [21147.481521] [<ffffffc000237e88>] process_one_work+0x264/0x3dc
>>>> [21147.481535] [<ffffffc0002392c0>] worker_thread+0x1f0/0x340
>>>> [21147.481553] [<ffffffc00023e500>] kthread+0xac/0xb8
>>>> [21147.481563] ---[ end trace dabc98ea48b8ba59 ]---
>>>>
>>>> [21147.486436] Unable to handle kernel paging request at virtual address
>>>> 0x6b6b6b6b6b6b6bcb
>>>> [21147.493816] pgd = ffffffc0a982d000
>>>> [21147.498337] [6b6b6b6b6b6b6bcb] *pgd=0000000000000000
>>>> [21147.502287] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>>>> [21147.507826] Modules linked in: core_ctl(PO) qdrbg_module(O)
>>>> qcrypto_module(O)
>>>> [21147.514951] CPU: 2 PID: 23935 Comm: kworker/u16:8 Tainted: P W O
>>>> 3.10.84-g0957845-00427-g56a05c2 #1
>>>> [21147.524686] Workqueue: events_unbound async_run_entry_fn
>>>> [21147.529965] task: ffffffc008138000 ti: ffffffc007880000 task.ti:
>>>> ffffffc007880000
>>>> [21147.537438] PC is at sysfs_create_dir+0x34/0xd4
>>>> [21147.541949] LR is at kobject_add_internal+0x120/0x24c
>>>> [21147.546979] pc : [<ffffffc000361fa8>] lr : [<ffffffc000456f5c>]
>>>> pstate:
>>>> 80000145
>>>> [21147.996295] Process kworker/u16:8 (pid: 23935, stack limit =
>>>> 0xffffffc007880058)
>>>> [21148.003673] Call trace:
>>>> [21148.006117] [<ffffffc000361fa8>] sysfs_create_dir+0x34/0xd4
>>>> [21148.011663] [<ffffffc000456f58>] kobject_add_internal+0x11c/0x24c
>>>> [21148.017737] [<ffffffc0004573ec>] kobject_add+0xc8/0xe4
>>>> [21148.022863] [<ffffffc00060a410>] device_add+0x11c/0x600
>>>> [21148.028069] [<ffffffc00061c390>] _request_firmware+0x8b4/0xa80
>>>> [21148.033886] [<ffffffc00061c58c>] request_firmware+0x30/0x3c
>>>> [21148.039439] [<ffffffc00061c5ec>] cache_firmware+0x54/0xb0
>>>> [21148.044822] [<ffffffc00061c65c>] __async_dev_cache_fw_image+0x14/0x54
>>>> [21148.051249] [<ffffffc000243794>] async_run_entry_fn+0x6c/0x12c
>>>> [21148.057064] [<ffffffc000237e88>] process_one_work+0x264/0x3dc
>>>> [21148.062792] [<ffffffc0002392c0>] worker_thread+0x1f0/0x340
>>>> [21148.068263] [<ffffffc00023e500>] kthread+0xac/0xb8
>>>> [21148.073039] Code: b5000094 14000026 d000ac54 9125a294 (7940c281)
>>>> [21148.080312] ---[ end trace dabc98ea48b8ba5a ]---
>>>>
>>>> Below is the callstack where the parent is freed from the slub_debug
>>>> info.
>>>> 0xFFFFFFC000608D74 \\vmlinux\base/core\class_dir_release+0x0C
>>>> 0xFFFFFFC0002FE6BC \\vmlinux\slub\free_debug_processing\fail+0x114
>>>> 0xFFFFFFC0002FE7A0 \\vmlinux\slub\__slab_free+0x44
>>>> 0xFFFFFFC0002FEE60 \\vmlinux\slub\kfree+0x1F8
>>>> 0xFFFFFFC000608D70 \\vmlinux\base/core\class_dir_release+0x8
>>>> 0xFFFFFFC000456D70 \\vmlinux\kobject\kobject_release+0x134
>>>> 0xFFFFFFC000456E10 \\vmlinux\kobject\kobject_put+0x58
>>>> 0xFFFFFFC000456C28 \\vmlinux\kobject\kobject_del+0x64
>>>> 0xFFFFFFC000609D78 \\vmlinux\base/core\device_del+0x150
>>>> 0xFFFFFFC00061C524 \\vmlinux\firmware_class\_request_firmware\out+0x71C
>>>> 0xFFFFFFC00061C58C \\vmlinux\firmware_class\request_firmware+0x30
>>>> 0xFFFFFFC00061C5EC \\vmlinux\firmware_class\cache_firmware+0x54
>>>> 0xFFFFFFC00061C65C
>>>> \\vmlinux\firmware_class\__async_dev_cache_fw_image+0x14
>>>> 0xFFFFFFC000243794 \\vmlinux\async\async_run_entry_fn+0x6C
>>>> 0xFFFFFFC000237E88 \\vmlinux\workqueue\process_one_work+0x264
>>>> 0xFFFFFFC0002392C0 \\vmlinux\workqueue\worker_thread\recheck+0x1A0
>>>> 0xFFFFFFC00023E500 \\vmlinux\kthread\kthread+0xAC
>>>>
>>>> For similar type of issue, I see there is an existing fix: "sysfs:
>>>> driver
>>>> core: Fix glue dir race condition by gdp_mutex".
>>>>
>>>> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/drivers/base/core.c?id=e4a60d139060975eb956717e4f63ae348d4d8cc5
>>>>
>>>> Still I am able to reproduce the issue. I have verified this race on
>>>> kernels: 3.10, 3.18 and 4.4.
>>>>
>>>> I followed below procedure to reproduce the issue:
>>>> 1. Enable "CONFIG_DEBUG_KOBJECT_RELEASE"
>>>> 2. Use test_firmware, modified the test for async calls
>>>> 3. Replace WARN_ON with BUG_ON in kref_get().
>>>> I ran below script from shell:
>>>> count=0
>>>> while [ 1 ]
>>>> do
>>>> echo 3 >
>>>> /sys/devices/virtual/misc/test_firmware/trigger_ufw
>>>> if [ $? -ne 0 ]; then
>>>> print "Exiting.. "
>>>> exit 1
>>>> fi
>>>> count=$(($count +1))
>>>> echo "count $count"
>>>> done
>>>>
>>>> With below patch, I could reproduce the issue in the 2nd iteration.
>>>
>>>
>>> I will run your test script to see if there is new findings.
>>>
>>>>
>>>>
>>>> -------------------------8<----------------------------------------------
>>>>
>>>> To reproduce the race condition, use firmware test and
>>>> CONFIG_DEBUG_KOBJECT_RELEASE=y, which delays kobject
>>>> releases. So that we can catch the bug easily.
>>>>
>>>> Enabled below flags in kernel config file:
>>>>
>>>> +CONFIG_DEBUG_KOBJECT_RELEASE=y
>>>> +CONFIG_TEST_FIRMWARE=y
>>>
>>>
>>> Looks I can't reproduce the issue with your approach.
>>>
>>>>
>>>>
>>>> diff --git a/include/linux/kref.h b/include/linux/kref.h
>>>> index 484604d..727fb24 100644
>>>> --- a/include/linux/kref.h
>>>> +++ b/include/linux/kref.h
>>>> @@ -44,7 +44,7 @@ static inline void kref_get(struct kref *kref)
>>>> * condition when this kref is freeing by some other thread right
>>>> now.
>>>> * In this case one should use kref_get_unless_zero()
>>>> */
>>>> - WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2);
>>>> + BUG_ON(atomic_inc_return(&kref->refcount) < 2);
>>>> }
>>>>
>>>> /**
>>>> diff --git a/lib/test_firmware.c b/lib/test_firmware.c
>>>> old mode 100644
>>>> new mode 100755
>>>> index 86374c1..14c9598
>>>> --- a/lib/test_firmware.c
>>>> +++ b/lib/test_firmware.c
>>>> @@ -18,6 +18,7 @@
>>>> #include <linux/miscdevice.h>
>>>> #include <linux/slab.h>
>>>> #include <linux/uaccess.h>
>>>> +#include <linux/async.h>
>>>>
>>>> +
>>>> +static ASYNC_DOMAIN_EXCLUSIVE(ufw_domain);
>>>> +
>>>> +static void __async_ufw_work(void *name,
>>>> + async_cookie_t cookie)
>>>> +{
>>>> + const struct firmware *fw;
>>>> + const char *fw_name = name;
>>>> + int ret;
>>>> +
>>>> + pr_err("requesting %s\n", fw_name);
>>>> +
>>>> + ret = request_firmware(&fw, fw_name, NULL);
>>>> + pr_err("loaded: %zu\n", fw ? fw->size : 0);
>>>> + if (!ret)
>>>> + kfree(fw);
>>>> +
>>>> +}
>>>> +
>>>> +static char *name[] = { "fw1.bin", "fw2.bin"};
>>>> +
>>>> +static ssize_t trigger_ufw_store(struct device *dev,
>>>> + struct device_attribute *attr,
>>>> + const char *buf, size_t count)
>>>> +{
>>>> + int rc, i, c=0;
>>>> + u32 start;
>>>> + char *fw_name;
>>>> +
>>>> + rc = kstrtou32(buf, 0, &start);
>>>> + if (rc){
>>>> + pr_err("Invalid option\n");
>>>> + return rc;
>>>> + }
>>>> + pr_err(" no of iterations %d\n", start);
>>>> +
>>>> + for (i=0; i< start; i++)
>>>> + {
>>>> + if (c >= 2)
>>>> + c = 0;
>>>> + fw_name = name[c++];
>>>> + async_schedule_domain(__async_ufw_work, (void *)fw_name,
>>>> &ufw_domain);
>>>> + }
>>>> + async_synchronize_full_domain(&ufw_domain);
>>>> + return count;
>>>> +}
>>>>
>>>> static DEVICE_ATTR_WO(trigger_request);
>>>> +static DEVICE_ATTR_WO(trigger_ufw);
>>>>
>>>> static int __init test_firmware_init(void)
>>>> {
>>>> @@ -92,6 +140,12 @@ static int __init test_firmware_init(void)
>>>> goto dereg;
>>>> }
>>>>
>>>> + rc = device_create_file(test_fw_misc_device.this_device,
>>>> + &dev_attr_trigger_ufw);
>>>> + if (rc) {
>>>> + pr_err("could not create sysfs interface: %d\n", rc);
>>>> + goto dereg;
>>>> + }
>>>> pr_warn("interface ready\n");
>>>>
>>>> return 0;
>>>> @@ -106,7 +160,9 @@ static void __exit test_firmware_exit(void)
>>>> {
>>>> release_firmware(test_firmware);
>>>> device_remove_file(test_fw_misc_device.this_device,
>>>> - &dev_attr_trigger_request);
>>>> + &dev_attr_trigger_request);
>>>> + device_remove_file(test_fw_misc_device.this_device,
>>>> + &dev_attr_trigger_ufw);
>>>> misc_deregister(&test_fw_misc_device);
>>>> pr_warn("removed interface\n");
>>>> }
>>>>
>>>>
>>>> ----------------------------8<--------------------------------------------
>>>>
>>>> Thanks and regards,
>>>> Chandrasekhar L.
>>>> --
>>>> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
>>>> member of
>>>> Code Aurora Forum,
>>>> a Linux Foundation Collaborative Project.