Bug 215647 - aoe: removing aoe devices with flush (implicit in rmmod aoe) leads to page fault

From: Thorsten Leemhuis
Date: Tue Mar 08 2022 - 00:56:40 EST


Hi! As part of my regression tracking work I noticed this bug report
that was filed about a week ago:

https://bugzilla.kernel.org/show_bug.cgi?id=215647

To quote the first para:

> there is a bug in the aoe driver module between v4.20-rc1 and
> v5.14-rc1 inroduced in 3582dd2 (aoe: convert aoeblk to blk-mq) and
> fixed in 6560ec9 (aoe: use blk_mq_alloc_disk and blk_cleanup_disk).
> Every forcible removal of an aoe device (eg. "rmmod aoe" with aoe
> devices available or "aoe-flush ex.x") leads to a page fault. This
> bug was successfully reproduced with kernel 5.10.92 from the debian
> repository, there were no changes to the affected code between
> v4.20-rc1 and v5.14-rc1. Version 4.19.208 (from debian buster) and
> 5.17-rc4 (from debian experimental) are confirmed not to be
> affected.

I checked the logs to see why mainline might not be affected anymore and
noticed a recent commit in the same area:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/block/aoe/aoedev.c?id=6560ec961a080944f8d5e1fef17b771bfaf189cb

> From 6560ec961a080944f8d5e1fef17b771bfaf189cb Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <hch@xxxxxx>
> Date: Wed, 2 Jun 2021 09:53:31 +0300
> Subject: aoe: use blk_mq_alloc_disk and blk_cleanup_disk
>
> Use blk_mq_alloc_disk and blk_cleanup_disk to simplify the gendisk and
> request_queue allocation.
>
> Signed-off-by: Christoph Hellwig <hch@xxxxxx>
> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@xxxxxxx>
> Link: https://lore.kernel.org/r/20210602065345.355274-17-hch@xxxxxx
> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
> ---
> drivers/block/aoe/aoedev.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> (limited to 'drivers/block/aoe/aoedev.c')
>
> diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c
> index e2ea2356da061..c5753c6bfe804 100644
> --- a/drivers/block/aoe/aoedev.c
> +++ b/drivers/block/aoe/aoedev.c
> @@ -277,9 +277,8 @@ freedev(struct aoedev *d)
> if (d->gd) {
> aoedisk_rm_debugfs(d);
> del_gendisk(d->gd);
> - put_disk(d->gd);
> + blk_cleanup_disk(d->gd);
> blk_mq_free_tag_set(&d->tag_set);
> - blk_cleanup_queue(d->blkq);
> }
> t = d->targets;
> e = t + d->ntargets;

Does that need backporting? Or is the patch the reporter provided in
bugzilla the easier and safer way to fix that regression in older releases?

Ciao, Thorsten