Re: Possible race in dev_coredumpm()-del_timer() path
From: Mukesh Ojha
Date: Wed Apr 13 2022 - 07:21:35 EST
On 4/13/2022 4:28 PM, Greg KH wrote:
On Wed, Apr 13, 2022 at 03:46:39PM +0530, Mukesh Ojha wrote:
On Wed, Apr 13, 2022 at 07:34:24AM +0200, Greg KH wrote:
On Wed, Apr 13, 2022 at 10:59:22AM +0530, Mukesh Ojha wrote:
Hi All,
We are hitting one race due to which try_to_grab_pending() is stuck .
What kernel version are you using?
5.10
5.10.0 was released a very long time ago. Please use a more modern
kernel release :)
Sorry, for the formatting mess.
In following scenario, while running (p1)dev_coredumpm() devcd device is
added to
the framework and uevent notification sent to userspace that result in the
call to (p2) devcd_data_write()
which eventually try to delete the queued timer which in the racy scenario
timer is not queued yet.
So, debug object report some warning and in the meantime timer is
initialized and queued from p1 path.
and from p2 path it gets overriden again timer->entry.pprev=NULL and
try_to_grab_pending() stuck
p1 p2(X)
dev_coredump() uevent sent to userspace
device_add() =========================> userspace process X reads the uevents
writes to devcd fd which
results into writes to
devcd_data_write()
mod_delayed_work()
try_to_grab_pending()
del_timer()
debug_assert_init()
INIT_DELAYED_WORK
schedule_delayed_work
debug_object_fixup()
Why do you have object debugging enabled? That's going to take a LONG
time, and will find bugs in your code. Perhaps like this one?
There is no issue if we disable debug object.
Here, some client module try to collect dump
via dev_coredumpm() which creates devcdX device and
expects userspace to read this data. Here, it might be
exposing a synchronization issue between dev_coredumpm()
and devcd_data_write() perhaps, a mutex ??
================o<===============================
11
12 diff --git a/drivers/base/devcoredump.c b/drivers/base/devcoredump.c
13 index 9243468..a620dcb 100644
14 --- a/drivers/base/devcoredump.c
15 +++ b/drivers/base/devcoredump.c
16 @@ -29,6 +29,7 @@ struct devcd_entry {
17 struct device devcd_dev;
18 void *data;
19 size_t datalen;
20 + struct mutex mutex;
21 struct module *owner;
22 ssize_t (*read)(char *buffer, loff_t offset, size_t count,
23 void *data, size_t datalen);
24 @@ -88,7 +89,9 @@ static ssize_t devcd_data_write(struct file
*filp, struct kobject *kobj,
25 struct device *dev = kobj_to_dev(kobj);
26 struct devcd_entry *devcd = dev_to_devcd(dev);
27
28 + mutex_lock(&devcd->mutex);
29 mod_delayed_work(system_wq, &devcd->del_wk, 0);
30 + mutex_unlock(&devcd->mutex);
31
32 return count;
33 }
34 @@ -282,13 +285,14 @@ void dev_coredumpm(struct device *dev, struct
module *owner,
35 devcd->read = read;
36 devcd->free = free;
37 devcd->failing_dev = get_device(dev);
38 -
39 + mutex_init(&devcd->mutex);
40 device_initialize(&devcd->devcd_dev);
41
42 dev_set_name(&devcd->devcd_dev, "devcd%d",
43 atomic_inc_return(&devcd_count));
44 devcd->devcd_dev.class = &devcd_class;
45
46 + mutex_lock(&devcd->mutex);
47 if (device_add(&devcd->devcd_dev))
48 goto put_device;
49
50 @@ -302,10 +306,11 @@ void dev_coredumpm(struct device *dev, struct
module *owner,
51
52 INIT_DELAYED_WORK(&devcd->del_wk, devcd_del);
53 schedule_delayed_work(&devcd->del_wk, DEVCD_TIMEOUT);
54 -
55 + mutex_unlock(&devcd->mutex);
Thanks,
-Mukesh
What type of device is this? What bus? What driver?
And if you turn object debugging off, what happens?
thanks,
greg k-h