Re: [PATCH] char: misc: make misc_open() and misc_register() killable
From: Greg KH
Date: Tue Jul 05 2022 - 10:20:45 EST
On Tue, Jul 05, 2022 at 11:01:38PM +0900, Tetsuo Handa wrote:
> On 2022/07/05 14:21, Tetsuo Handa wrote:
> > Possible locations where snapshot_open() might sleep with system_transition_mutex held are
> > pm_notifier_call_chain_robust()/wait_for_device_probe()/create_basic_memory_bitmaps().
> > But I think we can exclude pm_notifier_call_chain_robust() because lockdep does not report
> > that that process is holding "struct blocking_notifier_head"->rwsem. I suspect that
> > that process is sleeping at wait_for_device_probe(), for it waits for probe operations.
> >
> > ----------------------------------------
> > void wait_for_device_probe(void)
> > {
> > /* wait for the deferred probe workqueue to finish */
> > flush_work(&deferred_probe_work);
> >
> > /* wait for the known devices to complete their probing */
> > wait_event(probe_waitqueue, atomic_read(&probe_count) == 0);
> > async_synchronize_full();
> > }
> > ----------------------------------------
>
> syzbot confirmed that snapshot_open() is unable to proceed due to
> atomic_read(&probe_count) == 2 for 145 seconds.
>
> ----------------------------------------
> [ 86.794300][ T4209] Held system_transition_mutex.
> [ 86.821486][ T4209] Calling wait_for_device_probe()
> [ 86.841374][ T4209] Calling flush_work(&deferred_probe_work)
> [ 86.867398][ T4209] Calling wait_event(probe_waitqueue)
> [ 87.966188][ T4209] Calling probe_count=2
> (...snipped...)
> [ 233.554473][ T4209] Calling probe_count=2
> [ 234.444800][ T28] INFO: task syz-executor.4:4146 blocked for more than 143 seconds.
> ----------------------------------------
>
> Apart from whether we should fuzz snapshot code or not,
> there seems to be a bug that causes wait_for_device_probe() to hung.
What else is going on in the system at this point in time? Are devices
still being added as part of boot init sequences? Or has boot finished
properly and these are devices being removed?
Some device is being probed at the moment, maybe we have a deadlock
somewhere here...
thanks,
greg k-h