Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)

From: Christian Borntraeger
Date: Wed Sep 17 2014 - 03:59:24 EST


On 09/12/2014 10:09 PM, Christian Borntraeger wrote:
> On 09/12/2014 01:54 PM, Ming Lei wrote:
>> On Thu, Sep 11, 2014 at 6:26 PM, Christian Borntraeger
>> <borntraeger@xxxxxxxxxx> wrote:
>>> Folks,
>>>
>>> we have seen the following bug with 3.16 as a KVM guest. It suspect the blk-mq rework that happened between 3.15 and 3.16, but it can be something completely different.
>>>
>>
>> Care to share how you reproduce the issue?
>
> Host with 16GB RAM 32GB swap. 15 guest all with 2 GB RAM (and varying amount of CPUs). All do heavy file I/O.
> It did not happen with 3.15/3.15 in guest/host and does happen with 3.16/3.16. So our next step is to check
> 3.15/3.16 and 3.16/3.15 to identify if its host memory mgmt or guest block layer.

The crashed happen pretty randomly, but when they happen it seems that its the same trace as below. This makes memory corruption by host vm less likely and some thing wrong in blk-mq more likely I guess


>
> Christian
>
>>
>>> [ 65.992022] Unable to handle kernel pointer dereference in virtual kernel address space
>>> [ 65.992187] failing address: ccccccccccccd000 TEID: ccccccccccccd803
>>> [ 65.992363] Fault in home space mode while using kernel ASCE.
>>> [ 65.992365] AS:0000000000a7c007 R3:0000000000000024
>>> [ 65.993754] Oops: 0038 [#1] SMP
>>> [ 65.993923] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi virtio_balloon vhost_net vhost macvtap macvlan kvm dm_multipath virtio_net virtio_blk sunrpc
>>> [ 65.994274] CPU: 0 PID: 44 Comm: kworker/u6:2 Not tainted 3.16.0-20140814.0.c66c84c.fc18-s390xfrob #1
>>> [ 65.996043] Workqueue: writeback bdi_writeback_workfn (flush-251:32)
>>> [ 65.996222] task: 0000000002250000 ti: 0000000002258000 task.ti: 0000000002258000
>>> [ 65.996228] Krnl PSW : 0704f00180000000 00000000003ed114 (blk_mq_tag_to_rq+0x20/0x38)
>>> [ 65.997299] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 EA:3
>>> Krnl GPRS: 0000000000000040 cccccccccccccccc 0000000001619000 000000000000004e
>>> [ 65.997301] 000000000000004e 0000000000000000 0000000000000001 0000000000a0de18
>>> [ 65.997302] 0000000077ffbe18 0000000077ffbd50 000000006d72d620 000000000000004f
>>> [ 65.997304] 0000000001a99400 0000000000000080 00000000003eddee 0000000077ffbc28
>>> [ 65.997864] Krnl Code: 00000000003ed106: e31020300004 lg %r1,48(%r2)
>>> 00000000003ed10c: 91082044 tm 68(%r2),8
>>> #00000000003ed110: a7840009 brc 8,3ed122
>>> >00000000003ed114: e34016880004 lg %r4,1672(%r1)
>>> 00000000003ed11a: 59304100 c %r3,256(%r4)
>>> 00000000003ed11e: a7840003 brc 8,3ed124
>>> 00000000003ed122: 07fe bcr 15,%r14
>>> 00000000003ed124: b9040024 lgr %r2,%r4
>>> [ 65.998221] Call Trace:
>>> [ 65.998224] ([<0000000000000001>] 0x1)
>>> [ 65.998227] [<00000000003f17b6>] blk_mq_tag_busy_iter+0x7a/0xc4
>>> [ 65.998228] [<00000000003edcd6>] blk_mq_rq_timer+0x96/0x13c
>>> [ 65.999226] [<000000000013ee60>] call_timer_fn+0x40/0x110
>>> [ 65.999230] [<000000000013f642>] run_timer_softirq+0x2de/0x3d0
>>> [ 65.999238] [<0000000000135b70>] __do_softirq+0x124/0x2ac
>>> [ 65.999241] [<0000000000136000>] irq_exit+0xc4/0xe4
>>> [ 65.999435] [<000000000010bc08>] do_IRQ+0x64/0x84
>>> [ 66.437533] [<000000000067ccd8>] ext_skip+0x42/0x46
>>> [ 66.437541] [<00000000003ed7b4>] __blk_mq_alloc_request+0x58/0x1e8
>>> [ 66.437544] ([<00000000003ed788>] __blk_mq_alloc_request+0x2c/0x1e8)
>>> [ 66.437547] [<00000000003eef82>] blk_mq_map_request+0xc2/0x208
>>> [ 66.437549] [<00000000003ef860>] blk_sq_make_request+0xac/0x350
>>> [ 66.437721] [<00000000003e2d6c>] generic_make_request+0xc4/0xfc
>>> [ 66.437723] [<00000000003e2e56>] submit_bio+0xb2/0x1a8
>>> [ 66.438373] [<000000000031e8aa>] ext4_io_submit+0x52/0x80
>>> [ 66.438375] [<000000000031ccfa>] ext4_writepages+0x7c6/0xd0c
>>> [ 66.438378] [<00000000002aea20>] __writeback_single_inode+0x54/0x274
>>> [ 66.438379] [<00000000002b0134>] writeback_sb_inodes+0x28c/0x4ec
>>> [ 66.438380] [<00000000002b042e>] __writeback_inodes_wb+0x9a/0xe4
>>> [ 66.438382] [<00000000002b06a2>] wb_writeback+0x22a/0x358
>>> [ 66.438383] [<00000000002b0cd0>] bdi_writeback_workfn+0x354/0x538
>>> [ 66.438618] [<000000000014e3aa>] process_one_work+0x1aa/0x418
>>> [ 66.438621] [<000000000014ef94>] worker_thread+0x48/0x524
>>> [ 66.438625] [<00000000001560ca>] kthread+0xee/0x108
>>> [ 66.438627] [<000000000067c76e>] kernel_thread_starter+0x6/0xc
>>> [ 66.438628] [<000000000067c768>] kernel_thread_starter+0x0/0xc
>>> [ 66.438629] Last Breaking-Event-Address:
>>> [ 66.438631] [<00000000003edde8>] blk_mq_timeout_check+0x6c/0xb8
>>>
>>> I looked into the dump, and the full function is (annotated by me to match the source code)
>>> r2= tags
>>> r3= tag (4e)
>>> Dump of assembler code for function blk_mq_tag_to_rq:
>>> 0x00000000003ed0f4 <+0>: lg %r1,96(%r2) # r1 has now tags->rqs
>>> 0x00000000003ed0fa <+6>: sllg %r2,%r3,3 # r2 has tag*8
>>> 0x00000000003ed100 <+12>: lg %r2,0(%r2,%r1) # r2 now has rq (=tags->rqs[tag])
>>> 0x00000000003ed106 <+18>: lg %r1,48(%r2) # r1 now has rq->q
>>> 0x00000000003ed10c <+24>: tm 68(%r2),8 # test for rq->cmd_flags & REQ_FLUSH_SEQ
>>> 0x00000000003ed110 <+28>: je 0x3ed122 <blk_mq_tag_to_rq+46> # if not goto 3ed122
>>> 0x00000000003ed114 <+32>: lg %r4,1672(%r1) # r4 = rq->q->flush_rq <-------- CRASHES as rq->q points to cccccccccccc
>>> 0x00000000003ed11a <+38>: c %r3,256(%r4) # compare tag with rq->q->flush_rq->tag
>>> 0x00000000003ed11e <+42>: je 0x3ed124 <blk_mq_tag_to_rq+48> # if equal goto ..124
>>> 0x00000000003ed122 <+46>: br %r14 # return (with return value == r2)
>>> 0x00000000003ed124 <+48>: lgr %r2,%r4 # return value = r4
>>> 0x00000000003ed128 <+52>: br %r14 # return
>>>
>>> Does anyone have an idea?
>>> The request itself is completely filled with cc
>>
>> That is very weird, the 'rq' is got from hctx->tags, and rq should be
>> valid, and rq->q shouldn't have been changed even though it was
>> double free or double allocation.
>>
>>> I am currently asking myself if blk_mq_map_request should protect against softirq here but I cant say for sure,as I have never looked into that code before.
>>
>> No, it needn't the protection.
>>
>> Thanks,
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/