Re: Bug in disk event polling

From: Tejun Heo
Date: Fri Feb 10 2012 - 15:46:55 EST


(cc'ing Rafael)

Hello, Alan.

On Fri, Feb 10, 2012 at 03:31:20PM -0500, Alan Stern wrote:
> Don't ask me why this hasn't shown up earlier... There's a big fat bug
> in the implementation of disk event polling.
>
> The polling is done using the system_nrt_wq work queue, which isn't
> freezable. As a result, polling continues while the system is
> preparing for suspend or hibernation.
>
> Obviously I/O to suspended devices doesn't work well. Somewhat less
> obviously, error recovery for the failed I/O attempts can interfere
> with normal system resume.

Hmmm.... I see. Yeah, that can be a problem.

> You can see this for yourself easily enough by suspending or
> hibernating while a USB flash drive is plugged in. You don't even need
> to go through the full suspend procedure; the first two stages are
> enough (echo devices >/sys/power/pm_test). Check the system log
> afterward; most likely you'll find the flash drive got errors and had
> to be unregistered and re-enumerated.

Do you happen to have log of such failure? Polilng failure itself
shouldn't lead to such failure mode.

> I have verified that changing all occurrences of system_nrt_wq in
> block/genhd.c to system_freezable_wq fixes the bug. However this may
> not be the way you want to solve it; you may prefer to have a freezable
> non-reentrant work queue.

Please feel free to send out a patch to fix the issue. :)

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/