Re: General protection fault with use_blk_mq=1.

From: Zephaniah E. Loss-Cutler-Hull
Date: Wed Mar 28 2018 - 23:13:34 EST


On 03/28/2018 06:02 PM, Jens Axboe wrote:
> On 3/28/18 5:03 PM, Zephaniah E. Loss-Cutler-Hull wrote:
>> I am not subscribed to any of the lists on the To list here, please CC
>> me on any replies.
>>
>> I am encountering a fairly consistent crash anywhere from 15 minutes to
>> 12 hours after boot with scsi_mod.use_blk_mq=1 dm_mod.use_blk_mq=1>
>> The crash looks like:
>>

>>
>> Looking through the code, I'd guess that this is dying inside
>> blkg_rwstat_add, which calls percpu_counter_add_batch, which is what RIP
>> is pointing at.
>
> Leaving the whole thing here for Paolo - it's crashing off insertion of
> a request coming out of SG_IO. Don't think we've seen this BFQ failure
> case before.
>
> You can mitigate this by switching the scsi-mq devices to mq-deadline
> instead.
>

I'm thinking that I should also be able to mitigate it by disabling
CONFIG_DEBUG_BLK_CGROUP.

That should remove that entire chunk of code.

Of course, that won't help if this is actually a symptom of a bigger
problem.

Regards,
Zephaniah E. Loss-Cutler-Hull.

Attachment: signature.asc
Description: OpenPGP digital signature