Re: dm crypt: fix sleep-in-atomic-context bug in kcryptd_crypt_tasklet

From: duoming
Date: Wed May 24 2023 - 22:34:49 EST


Hello,

On Tue, 23 May 2023 13:53:23 -0400 Mike Snitzer wrote:

> > In order to improve the IO performance of the dm-crypt
> > implementation, the commit 39d42fa96ba1 ("dm crypt:
> > add flags to optionally bypass kcryptd workqueues")
> > adds tasklet to do the crypto operations.
> >
> > The tasklet callback function kcryptd_crypt_tasklet()
> > calls kcryptd_crypt() which is an original workqueue
> > callback function that may sleep. As a result, the
> > sleep-in-atomic-context bug may happen. The process
> > is shown below.
> >
> > (atomic context)
> > kcryptd_crypt_tasklet()
> > kcryptd_crypt()
> > kcryptd_crypt_write_convert()
> > wait_for_completion() //may sleep
> >
> > The wait_for_completion() is a function that may sleep.
> > In order to mitigate the bug, this patch changes
> > wait_for_completion() to try_wait_for_completion().
> >
> > Fixes: 39d42fa96ba1 ("dm crypt: add flags to optionally bypass kcryptd workqueues")
> > Signed-off-by: Duoming Zhou <duoming@xxxxxxxxxx>
> > ---
> > drivers/md/dm-crypt.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
> > index 8b47b913ee8..5e2b2463d87 100644
> > --- a/drivers/md/dm-crypt.c
> > +++ b/drivers/md/dm-crypt.c
> > @@ -2085,7 +2085,8 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
> > crypt_finished = atomic_dec_and_test(&ctx->cc_pending);
> > if (!crypt_finished && kcryptd_crypt_write_inline(cc, ctx)) {
> > /* Wait for completion signaled by kcryptd_async_done() */
> > - wait_for_completion(&ctx->restart);
> > + while (!try_wait_for_completion(&ctx->restart))
> > + ;
> > crypt_finished = 1;
> > }
> >
> > --
> > 2.17.1
> >
>
> Cc'ing Ignat for closer review.
>
> But wasn't this already addressed with this commit?:
> 8abec36d1274 dm crypt: do not wait for backlogged crypto request completion in softirq
>
> Mike

Thank you for your review, I think the commit 8abec36d1274 ("dm crypt:
do not wait for backlogged crypto request completion in softirq") could
not solve this problem.

When crypt_convert() returns BLK_STS_PROTECTION or BLK_STS_IOERR, meanwhile,
there is request that is queued and wait kcryptd_async_done() to process.
The workqueue callback function kcryptd_crypt_read_continue() will not be called.
The variable crypt_finished equals to zero, it will call wait_for_completion()
that may sleep. The relevant codes are shown below:

static blk_status_t crypt_convert(...)
{
...
/*
* The request is queued and processed asynchronously,
* completion function kcryptd_async_done() will be called.
*/
case -EINPROGRESS:
ctx->r.req = NULL;
ctx->cc_sector += sector_step;
tag_offset++;
continue;
...
/*
* There was a data integrity error.
*/
case -EBADMSG:
atomic_dec(&ctx->cc_pending);
return BLK_STS_PROTECTION;
/*
* There was an error while processing the request.
*/
default:
atomic_dec(&ctx->cc_pending);
return BLK_STS_IOERR;
}
...
}

static void kcryptd_crypt_write_convert(...)
{
...
r = crypt_convert(); //return BLK_STS_PROTECTION or BLK_STS_IOERR
...
if (r == BLK_STS_DEV_RESOURCE) { //this condition is not satisfied, the workqueue will not be called.
INIT_WORK(&io->work, kcryptd_crypt_write_continue);
queue_work(cc->crypt_queue, &io->work);
return;
}
...
// crypt_finished is zero, because there is request that is queued and wait kcryptd_async_done() to process.
crypt_finished = atomic_dec_and_test(&ctx->cc_pending);
if (!crypt_finished && kcryptd_crypt_write_inline(cc, ctx)) {
/* Wait for completion signaled by kcryptd_async_done() */
wait_for_completion(&ctx->restart); //may sleep!
...
}
...
}

Best regards,
Duoming Zhou