Re: dm-crypt with no_read_workqueue and no_write_workqueue + btrfs scrub = BUG()

From: Maciej S. Szmigiero
Date: Thu Dec 24 2020 - 13:54:11 EST


On 24.12.2020 19:46, Ignat Korchagin wrote:
On Wed, Dec 23, 2020 at 8:57 PM Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> wrote:

On Wed, Dec 23, 2020 at 04:37:34PM +0100, Maciej S. Szmigiero wrote:

It looks like to me that the skcipher API might not be safe to
call from a softirq context, after all.

skcipher is safe to use in a softirq. The problem is only in
dm-crypt where it tries to allocate memory with GFP_NOIO.

Hm.. After eliminating the GFP_NOIO (as well as some other sleeping
paths) from dm-crypt softirq code I still hit an occasional crash in
my extreme setup (QEMU with 1 CPU and cryptd_max_cpu_qlen set to 1)
(decoded with stacktrace_decode.sh):
(..)
This happens when running dm-crypt with no_read_workqueues on top of
an emulated NVME in QEMU (NVME driver "completes" IO in IRQ context).
Somehow sending decryption requests to cryptd in some fashion in
softirq context corrupts the crypto queue it seems.

You can try compiling your test kernel with KASAN, as it often
immediately points out where the memory starts to get corrupted
(if that's the bug).

Enabling other "checking" kernel debug options might help debugging
the root case of this, too.

Regards,
Ignat

Thanks,
Maciej