Re: [PATCH] devcoredump : Serialize devcd_del work

From: Mukesh Ojha
Date: Mon Apr 25 2022 - 09:17:32 EST


On Fri, Apr 22, 2022 at 03:53:35PM +0200, Johannes Berg wrote:
> On Fri, 2022-04-22 at 15:41 +0200, Johannes Berg wrote:
> > On Tue, 2022-04-19 at 15:57 +0530, Mukesh Ojha wrote:
> > > In following scenario(diagram), when one thread X running dev_coredumpm() adds devcd
> > > device to the framework which sends uevent notification to userspace
> > > and another thread Y reads this uevent and call to devcd_data_write()
> > > which eventually try to delete the queued timer that is not initialized/queued yet.
> > >
> > > So, debug object reports some warning and in the meantime, timer is initialized
> > > and queued from X path. and from Y path, it gets reinitialized again and
> > > timer->entry.pprev=NULL and try_to_grab_pending() stucks.
> > >
> > > To fix this, introduce mutex to serialize the behaviour.
> > >
> > > cpu0(X) cpu1(Y)
> > >
> > > dev_coredump() uevent sent to userspace
> > > device_add() =========================> userspace process Y reads the uevents
> > > writes to devcd fd which
> > > results into writes to
> > >
> > > devcd_data_write()
> > > mod_delayed_work()
> > > try_to_grab_pending()
> > > del_timer()
> > > debug_assert_init()
> > > INIT_DELAYED_WORK
> > > schedule_delayed_work
> > >
> >
> > Wouldn't it be easier to simply schedule this before adding it to sysfs
> > and sending the uevent?
> >
>
> Hm. I think that would solve this problem, but not all of the problems
> here ...
>
> Even with your change, I believe it's still racy wrt. disabled_store(),
> since that flushes the work but devcd_data_write() remains reachable
> (and might in fact be waiting for the mutex after your change), so I
> think we need an additional flag somewhere (in addition to the mutex) to
> serialize all of these things against each other.

Can we do something like this in v2

https://lore.kernel.org/lkml/1650892193-12888-1-git-send-email-quic_mojha@xxxxxxxxxxx/

Thanks,
-Mukesh

>
> johannes