Re: panic with CPU hotplug + blk-mq + scsi-mq

From: Ming Lei
Date: Sun Apr 19 2015 - 10:31:49 EST


On Sat, Apr 18, 2015 at 4:30 PM, Jens Axboe <axboe@xxxxxxxxx> wrote:
> On 04/17/2015 10:23 PM, Ming Lei wrote:
>>
>> Hi Dongsu,
>>
>> On Fri, Apr 17, 2015 at 5:41 AM, Dongsu Park
>> <dongsu.park@xxxxxxxxxxxxxxxx> wrote:
>>>
>>> Hi,
>>>
>>> there's a critical bug regarding CPU hotplug, blk-mq, and scsi-mq.
>>> Every time when a CPU is offlined, some arbitrary range of kernel memory
>>> seems to get corrupted. Then after a while, kernel panics at random
>>> places
>>> when block IOs are issued. (for example, see the call traces below)
>>
>>
>> Thanks for the report.
>>
>>>
>>> This bug can be easily reproducible with a Qemu VM running with
>>> virtio-scsi,
>>> when its guest kernel is 3.19-rc1 or higher, and when scsi-mq is loaded
>>> with blk-mq enabled. And yes, 4.0 release is still affected, as well as
>>> Jens' for-4.1/core. How to reproduce:
>>>
>>> # echo 0 > /sys/devices/system/cpu/cpu1/online
>>> (and issue some block IOs, that's it.)
>>>
>>> Bisecting between 3.18 and 3.19-rc1, it looks like this bug had been
>>> hidden
>>> until commit ccbedf117f01 ("virtio_scsi: support multi hw queue of
>>> blk-mq"),
>>> which started to allow virtio-scsi to map virtqueues to hardware queues
>>> of
>>> blk-mq. Reverting that commit makes the bug go away. However, I suppose
>>> reverting it could not be a correct solution.
>>
>>
>> I agree, and that patch only enables multiple hw queues.
>>
>>>
>>> More precisely, every time a CPU hotplug event gets triggered,
>>> a call graph is like the following:
>>>
>>> blk_mq_queue_reinit_notify()
>>> -> blk_mq_queue_reinit()
>>> -> blk_mq_map_swqueue()
>>> -> blk_mq_free_rq_map()
>>> -> scsi_exit_request()
>>>
>>> From that point, as soon as any address in the request gets modified, an
>>> arbitrary range of memory gets corrupted. My first guess was that
>>> probably
>>> the exit routine could try to deallocate tags->rqs[] where invalid
>>> addresses are stored. But actually it looks like it's not the case,
>>> and cmd->sense_buffer looks also valid.
>>> It's not obvious to me, exactly what could go wrong.
>>>
>>> Does anyone have an idea?
>>
>>
>> As far as I can see, at least two problems exist:
>> - race between timeout and CPU hotplug
>> - in case of shared tags, during CPU online handling, about setting
>> and checking hctx->tags
>>
>> So could you please test the attached two patches to see if they fix your
>> issue?
>>
>> I run them in my VM, and looks opps does disappear.
>
>
> Hard to comment on your patches directly when they are attached. Both look
> good to me. I'd perhaps change the ->tags check in #1 to use
> blk_mq_hw_queue_mapped() instead of checking directly. Might even be worth

It makes sense and blk_mq_hw_queue_mapped() is easy to backport too.

I will send out v1 later with this change.

> considering changing the normal iterator to skip unmapped queues, but that
> can be left for a later change.

Yes, that should be left later because we want easy backport to stable.

>
> --
> Jens Axboe
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/