Re: [BUG REPORT] Soft Lockup in smp_call_function_single+0xD8

From: Jeff Merkey
Date: Sat Jan 30 2016 - 13:05:50 EST


On 1/30/16, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> On Sat, Jan 30, 2016 at 9:53 AM, Jeff Merkey <linux.mdb@xxxxxxxxx> wrote:
>> On 1/30/16, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>> On Sat, Jan 30, 2016 at 12:41 AM, Jeff Merkey <linux.mdb@xxxxxxxxx>
>>> wrote:
>>>> Here is an MDB debugger trace of the code in question. please note
>>>> that the flags being compared don't match what's in r11 and the
>>>> comparison bits are wrong.
>>>>
>>>> (3)>
>>>>
>>>> Break at 0xFFFFFFFF81680022 due to - Proceed (single step)
>>>> RAX: 0000000000000080 RBX: 0000000000000002 RCX: 00007FC9877F2A30
>>>> RDX: 0000000000000000 RSI: FFFF8800BFD9BC00 RDI: FFFF88011FCD6C80
>>>> RSP: FFFF8800CD6C7F58 RBP: 00007FC988119000 R8: FFFF8800CD6C4000
>>>> R9: 0000017C85499D0E R10: FFFF8800C17BB8F0 R11: 0000000000000246 <<
>>>> WRONG!!!
>>>> R12: 00007FC987AC6400 R13: 0000000000000002 R14: 0000000000000001
>>>> R15: 0000000000000000 CS: 0010 DS: 0000 ES: 0000 FS: 0000 GS: 0000 SS:
>>>> 0018
>>>> IP: FFFFFFFF81680022 FLAGS: 0000000000000146 (PF ZF TF) << real flags
>>>> 0xffffffff81680022 49F7C300010100 test r11,0x10100 < comparison
>>>> bits correct r11 is WRONG!!!
>>>> (3)>
>>>
>>> I have no idea what bug you're talking about, and I have no idea how
>>> this code could cause a soft lockup in smp_call_function_single (at
>>> worst it could potentially enter userspace with invalid state, this
>>> alternating between user and kernel without making progress in user
>>> mode).
>>>
>>> And the HW flags register has no particular reason to match r11 or, in
>>> fact, anything saved in pt_regs at all.
>>>
>>> --Andy
>>>
>>
>> Hi Andy,
>>
>> There are two cases to handle here with the trap flags with sysret,
>> you are handling just one of them in your fix. There is the case
>> where you are going to use sysret to load the flags after the
>> instruction executes and that's the case you coded for. The other
>> case which is not being handled is the one where someone is single
>> stepping through this code and the trap flag gets set and then sysret
>> gets called.
>>
>> From what I can tell, sysret is a broken instruction which will just
>> hang if someone calls it with the trap flag set. It does not act
>> like this on ia32, just x86_64. The answer is to not use sysret and
>> use your iret return for all syscalls.
>>
>
> Just so you know, I have no intention of supporting this use case. In
> fact, I'm planning to eventually stop using IST for #DB entirely, at
> which point the kernel will crash terribly if this code is
> single-stepped (except when using a hypervisor to do this single
> stepping, which is a much more sensible way to handle it).
>
> So MDB may just need to force the slow syscall exit path
> unconditionally, and it'll have to do something else clever to handle
> SYSCALL, because that's going to crash, too.
>
> I will *not* insert a pushfq into the syscall return path. That would
> slow everything down for the sole benefit of an in-kernel debugger.
>
> --Andy
>

Yep, now you see it. I'll carry this fix locally in my patch series.

Jeff