Re: panic with CPU hotplug + blk-mq + scsi-mq

From: Jens Axboe
Date: Sat Apr 18 2015 - 16:31:10 EST


On 04/17/2015 10:23 PM, Ming Lei wrote:
Hi Dongsu,

On Fri, Apr 17, 2015 at 5:41 AM, Dongsu Park
<dongsu.park@xxxxxxxxxxxxxxxx> wrote:
Hi,

there's a critical bug regarding CPU hotplug, blk-mq, and scsi-mq.
Every time when a CPU is offlined, some arbitrary range of kernel memory
seems to get corrupted. Then after a while, kernel panics at random places
when block IOs are issued. (for example, see the call traces below)

Thanks for the report.


This bug can be easily reproducible with a Qemu VM running with virtio-scsi,
when its guest kernel is 3.19-rc1 or higher, and when scsi-mq is loaded
with blk-mq enabled. And yes, 4.0 release is still affected, as well as
Jens' for-4.1/core. How to reproduce:

# echo 0 > /sys/devices/system/cpu/cpu1/online
(and issue some block IOs, that's it.)

Bisecting between 3.18 and 3.19-rc1, it looks like this bug had been hidden
until commit ccbedf117f01 ("virtio_scsi: support multi hw queue of blk-mq"),
which started to allow virtio-scsi to map virtqueues to hardware queues of
blk-mq. Reverting that commit makes the bug go away. However, I suppose
reverting it could not be a correct solution.

I agree, and that patch only enables multiple hw queues.


More precisely, every time a CPU hotplug event gets triggered,
a call graph is like the following:

blk_mq_queue_reinit_notify()
-> blk_mq_queue_reinit()
-> blk_mq_map_swqueue()
-> blk_mq_free_rq_map()
-> scsi_exit_request()

From that point, as soon as any address in the request gets modified, an
arbitrary range of memory gets corrupted. My first guess was that probably
the exit routine could try to deallocate tags->rqs[] where invalid
addresses are stored. But actually it looks like it's not the case,
and cmd->sense_buffer looks also valid.
It's not obvious to me, exactly what could go wrong.

Does anyone have an idea?

As far as I can see, at least two problems exist:
- race between timeout and CPU hotplug
- in case of shared tags, during CPU online handling, about setting
and checking hctx->tags

So could you please test the attached two patches to see if they fix your issue?

I run them in my VM, and looks opps does disappear.

Hard to comment on your patches directly when they are attached. Both look good to me. I'd perhaps change the ->tags check in #1 to use blk_mq_hw_queue_mapped() instead of checking directly. Might even be worth considering changing the normal iterator to skip unmapped queues, but that can be left for a later change.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/