[BUG] Deadlock in block/blk-flush.c, with resolution
From: Dragan Milenkovic
Date: Wed Feb 06 2019 - 07:53:34 EST
The bug manifests by mdX_raid1 and other related tasks being blocked.
It is triggered by LVM RAID, but is not caused by it. I have also
triggered it by LVM + mdraid, but only once. It is more frequent by
LVM RAID.
It does not occur in the master branch, but it does in 4.20.y, 4.19.y,
4.18.y. Here is a Debian bug report:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=913119
I have tracked it to this commit:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=344e9ffcbd1898e1dc04085564a6e05c30ea8199
Specifically to this line:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/diff/block/blk-flush.c?id=344e9ffcbd1898e1dc04085564a6e05c30ea8199
The commit log message makes it appear as if this is a refactoring
change, but the check for q->elevator was inverted.
The line has not been changed between that commit and the current master
branch. Since I applied this change to my distribution's kernel (4.19),
my system has been completely stable.
Let me know if you need me to do anything else, but this seems as a
straight-forward cherry-pick.
Dragan