Re: [PATCH v4 1/2] x86/unwind: add ORC unwinder

From: Juergen Gross
Date: Thu Sep 28 2017 - 12:26:34 EST


On 28/09/17 15:55, Josh Poimboeuf wrote:
> On Thu, Sep 28, 2017 at 08:03:26AM +0200, Juergen Gross wrote:
>> On 27/09/17 23:08, Josh Poimboeuf wrote:
>>> On Tue, Aug 08, 2017 at 01:09:08PM -0700, Andy Lutomirski wrote:
>>>> On Tue, Aug 8, 2017 at 12:13 PM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
>>>>> On Tue, Aug 08, 2017 at 12:03:51PM -0700, Linus Torvalds wrote:
>>>>>> On Tue, Aug 8, 2017 at 11:58 AM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
>>>>>>>
>>>>>>> Take for example the lock_is_held_type() function. In vmlinux, it has
>>>>>>> the following instruction:
>>>>>>>
>>>>>>> callq *0xffffffff85a94880 (pv_irq_ops.save_fl)
>>>>>>>
>>>>>>> At runtime, that instruction is patched and replaced with a fast inline
>>>>>>> version of arch_local_save_flags() which eliminates the call:
>>>>>>>
>>>>>>> pushfq
>>>>>>> pop %rax
>>>>>>>
>>>>>>> The problem is when an interrupt hits after the push:
>>>>>>>
>>>>>>> pushfq
>>>>>>> --- irq ---
>>>>>>> pop %rax
>>>>>>
>>>>>> That should actually be something easily fixable, for an odd reason:
>>>>>> the instruction boundaries are different.
>>>>>>
>>>>>>> I'm not sure what the solution should be. It will probably need to be
>>>>>>> one of the following:
>>>>>>>
>>>>>>> a) either don't allow runtime "alternative" patches to mess with the
>>>>>>> stack pointer (objtool could enforce this); or
>>>>>>>
>>>>>>> b) come up with some way to register such patches with the ORC
>>>>>>> unwinder at runtime.
>>>>>>
>>>>>> c) just add ORC data for the alternative statically and _unconditionally_.
>>>>>>
>>>>>> No runtime registration. Just an unconditional entry for the
>>>>>> particular IP that comes after the "pushfq". It cannot match the
>>>>>> "callq" instruction, since it would be in the middle of that
>>>>>> instruction.
>>>>>>
>>>>>> Basically, just do a "union" of the ORC data for all the alternatives.
>>>>>>
>>>>>> Now, objtool should still verify that the instruction pointers for
>>>>>> alternatives are unique - or that they share the same ORC unwinder
>>>>>> information if they are not.
>>>>>>
>>>>>> But in cases like this, when the instruction boundaires are different,
>>>>>> things should "just work", with no need for any special cases.
>>>>>>
>>>>>> Hmm?
>>>>>
>>>>> Yeah, that might work. Objtool already knows about alternatives, so it
>>>>> might not be too hard. I'll try it.
>>>>
>>>> But this one's not an actual alternative, right? It's a pv op.
>>>>
>>>> I would advocate that we make it an alternative after all. I frickin'
>>>> hate the PV irq ops. It would like roughly like this:
>>>>
>>>> ALTERNATIVE "pushfq; popq %rax", "callq *pv_irq_ops.save_fl",
>>>> X86_FEATURE_GODDAMN_PV_IRQ_OPS
>>>>
>>>> (The obvious syntax error and the naming should probably be fixed.
>>>> Also, this needs to live in an #ifdef because it needs to build on
>>>> kernels with pv support. It should also properly register itself as a
>>>> pv patch site.)
>>>
>>> I've got a prototype of the above working, where vmlinux shows:
>>>
>>> pushfq
>>> pop %rax
>>> nop
>>> nop
>>> nop
>>> nop
>>> nop
>>>
>>> instead of:
>>>
>>> callq *0xffffffff81e3a400 (pv_irq_ops.save_fl)
>>>
>>> Which is nice because the vmlinux disassembly now matches the most
>>> common runtime cases (everything except Xen and vsmp). And it also
>>> fixes the upthread objtool issue.
>>>
>>> The only slight issue with the patches is that hypervisors need access
>>> to the pv ops much earlier than when alternatives are applied. So I had
>>> to add a new .pv_alternatives section for these pv ops alternatives, so
>>> they can be patched very early, if running in a hypervisor.
>>>
>>> Will clean up the code and post something relatively soon.
>>>
>>
>> Are you combining alternatives and pvops then? I'm asking because in an
>> up and running system under Xen the callq *... will be replaced with a
>> much faster "call xen_save_fl". This should still be the case after
>> your patch.
>
> Right, it's not combining alternatives and pv ops, it's just adding
> another step. So first the
>
> pushfq; pop %rax
>
> is replaced with
>
> callq *pv_irq_ops.save_fl
>
> and then later, after the xen ops structs are finalized, it's replaced
> with
>
> callq xen_save_fl
>

Okay, thanks for confirming.


Juergen