Re: [BUG REPORT] Soft Lockup in smp_call_function_single+0xD8

From: Jeff Merkey
Date: Sat Jan 30 2016 - 12:53:29 EST

On 1/30/16, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> On Sat, Jan 30, 2016 at 12:41 AM, Jeff Merkey <linux.mdb@xxxxxxxxx> wrote:
>> Here is an MDB debugger trace of the code in question. please note
>> that the flags being compared don't match what's in r11 and the
>> comparison bits are wrong.
>> (3)>
>> Break at 0xFFFFFFFF81680022 due to - Proceed (single step)
>> RAX: 0000000000000080 RBX: 0000000000000002 RCX: 00007FC9877F2A30
>> RDX: 0000000000000000 RSI: FFFF8800BFD9BC00 RDI: FFFF88011FCD6C80
>> RSP: FFFF8800CD6C7F58 RBP: 00007FC988119000 R8: FFFF8800CD6C4000
>> R9: 0000017C85499D0E R10: FFFF8800C17BB8F0 R11: 0000000000000246 <<
>> WRONG!!!
>> R12: 00007FC987AC6400 R13: 0000000000000002 R14: 0000000000000001
>> R15: 0000000000000000 CS: 0010 DS: 0000 ES: 0000 FS: 0000 GS: 0000 SS:
>> 0018
>> IP: FFFFFFFF81680022 FLAGS: 0000000000000146 (PF ZF TF) << real flags
>> 0xffffffff81680022 49F7C300010100 test r11,0x10100 < comparison
>> bits correct r11 is WRONG!!!
>> (3)>
> I have no idea what bug you're talking about, and I have no idea how
> this code could cause a soft lockup in smp_call_function_single (at
> worst it could potentially enter userspace with invalid state, this
> alternating between user and kernel without making progress in user
> mode).
> And the HW flags register has no particular reason to match r11 or, in
> fact, anything saved in pt_regs at all.
> --Andy

Hi Andy,

There are two cases to handle here with the trap flags with sysret,
you are handling just one of them in your fix. There is the case
where you are going to use sysret to load the flags after the
instruction executes and that's the case you coded for. The other
case which is not being handled is the one where someone is single
stepping through this code and the trap flag gets set and then sysret
gets called.

>From what I can tell, sysret is a broken instruction which will just
hang if someone calls it with the trap flag set. It does not act
like this on ia32, just x86_64. The answer is to not use sysret and
use your iret return for all syscalls.


TF Set -> call sysret =- Hang
Load previous flags - > call sysret (pop TF flags) = Hang

Two cases to handle.

The smp_call_function_single bug is just a symptom when this other
hang condition shows up.