Re: [PATCH v3 14/51] x86/asm/head: put real return address on idle task stack
From: Josh Poimboeuf
Date: Wed Aug 17 2016 - 17:10:33 EST
On Wed, Aug 17, 2016 at 03:30:55PM -0500, Nilay Vaish wrote:
> On 12 August 2016 at 09:28, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
> > The frame at the end of each idle task stack has a zeroed return
> > address. This is inconsistent with real task stacks, which have a real
> > return address at that spot. This inconsistency can be confusing for
> > stack unwinders.
> >
> > Make it a real address by using the side effect of a call instruction to
> > push the instruction pointer on the stack.
> >
> > Signed-off-by: Josh Poimboeuf <jpoimboe@xxxxxxxxxx>
> > ---
> > arch/x86/kernel/head_64.S | 5 +++--
> > 1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> > index 3621ad2..c90f481 100644
> > --- a/arch/x86/kernel/head_64.S
> > +++ b/arch/x86/kernel/head_64.S
> > @@ -298,8 +298,9 @@ ENTRY(start_cpu)
> > * REX.W + FF /5 JMP m16:64 Jump far, absolute indirect,
> > * address given in m16:64.
> > */
> > - movq initial_code(%rip),%rax
> > - pushq $0 # fake return address to stop unwinder
> > + call 1f # put return address on stack for unwinder
> > +1: xorq %rbp, %rbp # clear frame pointer
> > + movq initial_code(%rip), %rax
> > pushq $__KERNEL_CS # set correct cs
> > pushq %rax # target address in negative space
> > lretq
>
>
> Josh, I have a couple of questions.
>
> It seems to me that this patch and the patch 16/51 are both aiming at
> the same thing, but are for two different architectures: 32-bit and
> 64-bit versions of x86. But you have taken slightly different
> approaches in the two patches (for 64-bit, we first jump and then make
> a function call, for 32-bit we directly call the function). Is there
> any particular reason for this? May be I missed out on something.
Yes, the 64-bit code is different: it has to use a far return (lretq) in
order to jump to the 64-bit address stored in 'initial_code'.
So instead of calling the initial code directly, I had to use a more
obtuse approach there ("call 1f") which just places a start_cpu()
address at the right place on the stack before the lretq (which
"returns" to the intial code).
> Second, this is for the whole patch series. If I wanted to test this
> series, how should I go about doing so?
That's a loaded question. :-) I'd recommend doing lots of stack dumps
and traces in various situations and making sure everything looks sane,
on both 32-bit and 64-bit, and with both CONFIG_FRAME_POINTER=n and
CONFIG_FRAME_POINTER=y. Some easy ones are:
cat /proc/*/stack
echo 1 > /proc/sys/kernel/sysrq; echo l > /proc/sysrq-trigger
echo 1 > /proc/sys/kernel/sysrq; echo t > /proc/sysrq-trigger
perf record -g <cmd>; perf report
Also, if you have CONFIG_LOCKDEP enabled, it will silently save a bunch
of stacks in the background, though you won't ever see any evidence of
that unless something goes wrong (you can check if the unwinder spit out
any warnings with 'dmesg |grep WARNING').
For my own testing, I also hacked up my kernel to dump the stack in
various scenarios: interrupts, NMIs, page faults, preemption, perf
callchain record, etc.
BTW, I'll be posting v4 soon, probably Thursday morning US Central time.
--
Josh