Re: Possible race in dev_coredumpm()-del_timer() path

From: Mukesh Ojha
Date: Wed Apr 13 2022 - 10:17:36 EST




On 4/13/2022 4:28 PM, Greg KH wrote:
On Wed, Apr 13, 2022 at 03:46:39PM +0530, Mukesh Ojha wrote:
On Wed, Apr 13, 2022 at 07:34:24AM +0200, Greg KH wrote:
On Wed, Apr 13, 2022 at 10:59:22AM +0530, Mukesh Ojha wrote:
Hi All,

We are hitting one race due to which try_to_grab_pending() is stuck .

What kernel version are you using?

5.10

5.10.0 was released a very long time ago. Please use a more modern
kernel release :)


It would not be feasible for us to switch to latest kernel and I think, this issue could be there in recent kernel as well.

Sorry, for the formatting mess.

In following scenario, while running (p1)dev_coredumpm() devcd device is
added to
the framework and uevent notification sent to userspace that result in the
call to (p2) devcd_data_write()
which eventually try to delete the queued timer which in the racy scenario
timer is not queued yet.
So, debug object report some warning and in the meantime timer is
initialized and queued from p1 path.
and from p2 path it gets overriden again timer->entry.pprev=NULL and
try_to_grab_pending() stuck
p1 p2(X)

dev_coredump() uevent sent to userspace
device_add() =========================> userspace process X reads the uevents
writes to devcd fd which
results into writes to

devcd_data_write()
mod_delayed_work()
try_to_grab_pending()
del_timer()
debug_assert_init()
INIT_DELAYED_WORK
schedule_delayed_work
debug_object_fixup()

Why do you have object debugging enabled?

We have enabled object debugging to catch more issues around kernel.

That's going to take a LONG
time, and will find bugs in your code. Perhaps like this one?
What type of device is this? What bus? What driver?

remoteproc client device driver would call dev_coredumpm() and devcd device gets added as part of the call.


And if you turn object debugging off, what happens?

We have not observed issue after disabling object debugging off.

Regards,
Mukesh


thanks,

greg k-h