Re: [PATCH 1/2] bdi: Do not use freezable workqueue
From: Rafael J. Wysocki
Date: Sun Oct 06 2019 - 11:08:51 EST
On Fri, Oct 4, 2019 at 3:22 PM Jens Axboe <axboe@xxxxxxxxx> wrote:
>
> On 10/4/19 4:00 AM, Mika Westerberg wrote:
> > A removable block device, such as NVMe or SSD connected over Thunderbolt
> > can be hot-removed any time including when the system is suspended. When
> > device is hot-removed during suspend and the system gets resumed, kernel
> > first resumes devices and then thaws the userspace including freezable
> > workqueues. What happens in that case is that the NVMe driver notices
> > that the device is unplugged and removes it from the system. This ends
> > up calling bdi_unregister() for the gendisk which then schedules
> > wb_workfn() to be run one more time.
> >
> > However, since the bdi_wq is still frozen flush_delayed_work() call in
> > wb_shutdown() blocks forever halting system resume process. User sees
> > this as hang as nothing is happening anymore.
> >
> > Triggering sysrq-w reveals this:
> >
> > Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme]
> > Call Trace:
> > ? __schedule+0x2c5/0x630
> > ? wait_for_completion+0xa4/0x120
> > schedule+0x3e/0xc0
> > schedule_timeout+0x1c9/0x320
> > ? resched_curr+0x1f/0xd0
> > ? wait_for_completion+0xa4/0x120
> > wait_for_completion+0xc3/0x120
> > ? wake_up_q+0x60/0x60
> > __flush_work+0x131/0x1e0
> > ? flush_workqueue_prep_pwqs+0x130/0x130
> > bdi_unregister+0xb9/0x130
> > del_gendisk+0x2d2/0x2e0
> > nvme_ns_remove+0xed/0x110 [nvme_core]
> > nvme_remove_namespaces+0x96/0xd0 [nvme_core]
> > nvme_remove+0x5b/0x160 [nvme]
> > pci_device_remove+0x36/0x90
> > device_release_driver_internal+0xdf/0x1c0
> > nvme_remove_dead_ctrl_work+0x14/0x30 [nvme]
> > process_one_work+0x1c2/0x3f0
> > worker_thread+0x48/0x3e0
> > kthread+0x100/0x140
> > ? current_work+0x30/0x30
> > ? kthread_park+0x80/0x80
> > ret_from_fork+0x35/0x40
> >
> > This is not limited to NVMes so exactly same issue can be reproduced by
> > hot-removing SSD (over Thunderbolt) while the system is suspended.
> >
> > Prevent this from happening by removing WQ_FREEZABLE from bdi_wq.
>
> This series looks good for me, I don't think there's a reason for
> the workers to be marked freezable.
I was a bit concerned that the original idea might be to prevent
writes to the persistent storage from occurring after creating an
image during hibernation, but if that's not the case, the series is
fine from the general power management standpoint, so
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
for both patches.