Re: System hangs if NVMe/SSD is removed during suspend

From: Rafael J. Wysocki
Date: Wed Oct 09 2019 - 09:22:51 EST


On 10/7/2019 12:08 PM, Jan Kara wrote:
On Fri 04-10-19 07:32:40, Jens Axboe wrote:
On 10/4/19 5:01 AM, Mika Westerberg wrote:
On Fri, Oct 04, 2019 at 11:59:26AM +0200, Rafael J. Wysocki wrote:
On Friday, October 4, 2019 10:03:40 AM CEST Mika Westerberg wrote:
On Thu, Oct 03, 2019 at 09:50:33AM -0700, Tejun Heo wrote:
Hello, Mika.

On Wed, Oct 02, 2019 at 03:21:36PM +0300, Mika Westerberg wrote:
but from that discussion I don't see more generic solution to be
implemented.

Any ideas we should fix this properly?
Yeah, the only fix I can think of is not using freezable wq. It's
just not a good idea and not all that difficult to avoid using.
OK, thanks.

In that case I will just make a patch that removes WQ_FREEZABLE from
bdi_wq and see what people think about it :)
I guess that depends on why WQ_FREEZABLE was added to it in the first place. :-)

The reason might be to avoid writes to persistent storage after creating an
image during hibernation, since wqs remain frozen throughout the entire
hibernation including the image saving phase.
Good point.

Arguably, making the wq freezable is kind of a sledgehammer approach to that
particular issue, but in principle it may prevent data corruption from
occurring, so be careful there.
I tried to find the commit that introduced the "freezing" and I think it
is this one:

03ba3782e8dc writeback: switch to per-bdi threads for flushing data

Unfortunately from that commit it is not clear (at least to me) why it
calls set_freezable() for the bdi task. It does not look like it has
anything to do with blocking writes to storage while entering
hibernation but I may be mistaken.
Wow, a decade ago...

Honestly, I don't recall why these were marked freezable, and as I wrote
in the other reply, I don't think there's a good reason for that to be
the case.
Well, cannot it happen that the flush worker will get stuck in D state
because some subsystem is already suspended and thus hibernation fails
(because AFAIK processes in uninterruptible sleep block hibernation)?

I was also somewhat worried that the hibernation image could be
inconsistent if flush workers do something while hibernation image is being
taken but that does not seem to be a valid concern as all kernel processes
get frozen before hibernation image is taken.

To be precise, nothing is scheduled while creating a hibernation image, but once the image has been created, threads that are not frozen can be scheduled again and there are kernel threads which aren't frozen.

So the question is whether or not any of the kernel threads which are not frozen can do anything potentially unsafe if the bdi wq is not freezable and I don't quite see what that might be.