Re: [BUG REPORT] Soft Lockup in smp_call_function_single+0xD8

From: Andy Lutomirski
Date: Sat Jan 30 2016 - 12:58:26 EST


On Sat, Jan 30, 2016 at 9:53 AM, Jeff Merkey <linux.mdb@xxxxxxxxx> wrote:
> On 1/30/16, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>> On Sat, Jan 30, 2016 at 12:41 AM, Jeff Merkey <linux.mdb@xxxxxxxxx> wrote:
>>> Here is an MDB debugger trace of the code in question. please note
>>> that the flags being compared don't match what's in r11 and the
>>> comparison bits are wrong.
>>>
>>> (3)>
>>>
>>> Break at 0xFFFFFFFF81680022 due to - Proceed (single step)
>>> RAX: 0000000000000080 RBX: 0000000000000002 RCX: 00007FC9877F2A30
>>> RDX: 0000000000000000 RSI: FFFF8800BFD9BC00 RDI: FFFF88011FCD6C80
>>> RSP: FFFF8800CD6C7F58 RBP: 00007FC988119000 R8: FFFF8800CD6C4000
>>> R9: 0000017C85499D0E R10: FFFF8800C17BB8F0 R11: 0000000000000246 <<
>>> WRONG!!!
>>> R12: 00007FC987AC6400 R13: 0000000000000002 R14: 0000000000000001
>>> R15: 0000000000000000 CS: 0010 DS: 0000 ES: 0000 FS: 0000 GS: 0000 SS:
>>> 0018
>>> IP: FFFFFFFF81680022 FLAGS: 0000000000000146 (PF ZF TF) << real flags
>>> 0xffffffff81680022 49F7C300010100 test r11,0x10100 < comparison
>>> bits correct r11 is WRONG!!!
>>> (3)>
>>
>> I have no idea what bug you're talking about, and I have no idea how
>> this code could cause a soft lockup in smp_call_function_single (at
>> worst it could potentially enter userspace with invalid state, this
>> alternating between user and kernel without making progress in user
>> mode).
>>
>> And the HW flags register has no particular reason to match r11 or, in
>> fact, anything saved in pt_regs at all.
>>
>> --Andy
>>
>
> Hi Andy,
>
> There are two cases to handle here with the trap flags with sysret,
> you are handling just one of them in your fix. There is the case
> where you are going to use sysret to load the flags after the
> instruction executes and that's the case you coded for. The other
> case which is not being handled is the one where someone is single
> stepping through this code and the trap flag gets set and then sysret
> gets called.
>
> From what I can tell, sysret is a broken instruction which will just
> hang if someone calls it with the trap flag set. It does not act
> like this on ia32, just x86_64. The answer is to not use sysret and
> use your iret return for all syscalls.
>

Just so you know, I have no intention of supporting this use case. In
fact, I'm planning to eventually stop using IST for #DB entirely, at
which point the kernel will crash terribly if this code is
single-stepped (except when using a hypervisor to do this single
stepping, which is a much more sensible way to handle it).

So MDB may just need to force the slow syscall exit path
unconditionally, and it'll have to do something else clever to handle
SYSCALL, because that's going to crash, too.

I will *not* insert a pushfq into the syscall return path. That would
slow everything down for the sole benefit of an in-kernel debugger.

--Andy