Re: RIP: e030:bfq_exit_icq_bfqq+0x147/0x1c0

From: Paolo Valente
Date: Thu Aug 08 2019 - 06:21:54 EST




> Il giorno 8 ago 2019, alle ore 12:21, Sander Eikelenboom <linux@xxxxxxxxxxxxxx> ha scritto:
>
> On 08/08/2019 11:10, Paolo Valente wrote:
>>
>>
>>> Il giorno 8 ago 2019, alle ore 11:05, Sander Eikelenboom <linux@xxxxxxxxxxxxxx> ha scritto:
>>>
>>> L.S.,
>>>
>>> While testing a linux 5.3-rc3 kernel on my Xen server I come across the splat below when trying to shutdown all the VM's.
>>> This is after the server has ran for a few days without any problem. It seems to happen consistently.
>>>
>>> It seems it's in the same area as dbc3117d4ca9e17819ac73501e914b8422686750, but already rc3 incorporates that patch.
>>>
>>> Any ideas ?
>>>
>>
>> Could you try these fixes I proposed yesterday:
>> https://lkml.org/lkml/2019/8/7/536
>> or, on patchwork:
>> https://patchwork.kernel.org/patch/11082247/
>> https://patchwork.kernel.org/patch/11082249/
>
> Hi Paolo,
>
> These two above seem to fix the issue !
> So thanks for the swift reply (and the patchwork links for easy
> downloading the patches).
>
> I will test the third unrelated patch as well, but if you don't hear
> back , it's all good.
>

Great! Thank you for offering to test also the other patch. Tested-by are welcome too :)

Thanks,
Paolo

> Thanks again !
>
> --
> Sander
>
>> I posted a further fix too, which should be unrelated. But, just in case:
>> https://lkml.org/lkml/2019/8/7/715
>> or, on patchwork:
>> https://patchwork.kernel.org/patch/11082521/
>>
>> Crossing my fingers (and think you for reporting this),
>> Paolo
>>
>>> --
>>> Sander
>>>
>>>
>>> [80915.716048] BUG: unable to handle page fault for address: 0000100000000008
>>> [80915.724188] #PF: supervisor write access in kernel mode
>>> [80915.733182] #PF: error_code(0x0002) - not-present page
>>> [80915.741455] PGD 0 P4D 0
>>> [80915.750538] Oops: 0002 [#1] SMP NOPTI
>>> [80915.758425] CPU: 4 PID: 11407 Comm: 17.hda-2 Tainted: G W 5.3.0-rc3-20190807-doflr+ #1
>>> [80915.766137] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010
>>> [80915.773737] RIP: e030:bfq_exit_icq_bfqq+0x147/0x1c0
>>> [80915.781294] Code: 00 00 00 00 00 00 48 0f ba b0 20 01 00 00 0c 48 8b 88 f0 01 00 00 48 85 c9 74 29 48 8b b0 e8 01 00 00 48 89 31 48 85 f6 74 04 <48> 89 4e 08 48 c7 80 e8 01 00 00 00 00 00 00 48 c7 80 f0 01 00 00
>>> [80915.796792] RSP: e02b:ffffc9000473be28 EFLAGS: 00010006
>>> [80915.804419] RAX: ffff888070393200 RBX: ffff888076c4a800 RCX: ffff888076c4a9f8
>>> [80915.810254] device vif17.0 left promiscuous mode
>>> [80915.811906] RDX: 0000100000000000 RSI: 0000100000000000 RDI: 0000000000000000
>>> [80915.811908] RBP: ffff888077efc398 R08: 0000000000000004 R09: ffffffff81106800
>>> [80915.811909] R10: ffff88807804ca40 R11: ffffc9000473be31 R12: ffff888005256bf0
>>> [80915.811909] R13: 0000000000000000 R14: ffff888005256800 R15: ffffffff82a6a3c0
>>> [80915.811919] FS: 00007f1c30a8dbc0(0000) GS:ffff88807d500000(0000) knlGS:0000000000000000
>>> [80915.819456] xen_bridge: port 18(vif17.0) entered disabled state
>>> [80915.826569] CS: 10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [80915.826571] CR2: 0000100000000008 CR3: 000000005d9d0000 CR4: 0000000000000660
>>> [80915.826575] Call Trace:
>>> [80915.826592] bfq_exit_icq+0xe/0x20
>>> [80915.826595] put_io_context_active+0x52/0x80
>>> [80915.826599] do_exit+0x774/0xac0
>>> [80915.906037] ? xen_blkif_be_int+0x30/0x30
>>> [80915.913311] kthread+0xda/0x130
>>> [80915.920398] ? kthread_park+0x80/0x80
>>> [80915.927524] ret_from_fork+0x22/0x40
>>> [80915.934512] Modules linked in:
>>> [80915.941412] CR2: 0000100000000008
>>> [80915.948221] ---[ end trace 61315493e0f8ef40 ]---
>>> [80915.954984] RIP: e030:bfq_exit_icq_bfqq+0x147/0x1c0
>>> [80915.961850] Code: 00 00 00 00 00 00 48 0f ba b0 20 01 00 00 0c 48 8b 88 f0 01 00 00 48 85 c9 74 29 48 8b b0 e8 01 00 00 48 89 31 48 85 f6 74 04 <48> 89 4e 08 48 c7 80 e8 01 00 00 00 00 00 00 48 c7 80 f0 01 00 00
>>> [80915.976124] RSP: e02b:ffffc9000473be28 EFLAGS: 00010006
>>> [80915.983205] RAX: ffff888070393200 RBX: ffff888076c4a800 RCX: ffff888076c4a9f8
>>> [80915.990321] RDX: 0000100000000000 RSI: 0000100000000000 RDI: 0000000000000000
>>> [80915.997319] RBP: ffff888077efc398 R08: 0000000000000004 R09: ffffffff81106800
>>> [80916.004427] R10: ffff88807804ca40 R11: ffffc9000473be31 R12: ffff888005256bf0
>>> [80916.011525] R13: 0000000000000000 R14: ffff888005256800 R15: ffffffff82a6a3c0
>>> [80916.018679] FS: 00007f1c30a8dbc0(0000) GS:ffff88807d500000(0000) knlGS:0000000000000000
>>> [80916.025897] CS: 10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [80916.033116] CR2: 0000100000000008 CR3: 000000005d9d0000 CR4: 0000000000000660
>>> [80916.040348] Fixing recursive fault but reboot is needed!