Re: WARNING: kernel stack frame pointer at ffffffff82e03f40 in swapper:0 has bad value (null)

From: Josh Poimboeuf
Date: Mon Dec 12 2016 - 17:11:52 EST


On Mon, Dec 12, 2016 at 10:34:46PM +0100, Borislav Petkov wrote:
> On Mon, Dec 12, 2016 at 03:16:27PM -0600, Josh Poimboeuf wrote:
> > I still can't figure out what could cause this, nor can I recreate it.
>
> Want my .config?

Yes, please.

> > Andy, any idea? I'm trying to figure out why a stack trace of the
> > initial task, early in start_kernel(), would show start_cpu() on the
> > stack *twice*. The start_cpu() entry on the stack at ffffffffbce03f50
> > is right where it's supposed to be. But then there's another
> > start_cpu() entry at 0xffffffffbce03f48 which is pointed to by the frame
> > pointer chain. I can't figure out where that one came from and why the
> > stack is offset by a word, compared to all the other idle task stacks
> > I've seen.
>
> Btw, why do you have:
>
> call 1f # put return address on stack for unwinder
>
> there in start_cpu() instead of
>
> push $start_cpu
>
> or so? That CALL looks strange there. If you want to put the return
> address, just push start_cpu's address and that's it.
>
> Or am I missing something?

Yeah, it's kind of obtuse.

The problem with "push $start_cpu" is that it will show up on the stack
trace as:

secondary_startup_64+0x90/0x90

instead of what you would expect:

start_cpu+0x0/0x14

That's because the printk '%pB' modifier is smart enough to know that
the beginning of a function isn't a valid function call return address.
The only way such an address could end up on the stack would be if the
previous function made a tail call. So it shows the end of the previous
function instead.

That said, the code could probably be made a little clearer by changing
"call 1f" to "push $1f" and then move the '1' label to after the lretq
instruction, like:

pushq $1f # put return address on stack for unwinder
xorq %rbp, %rbp # clear frame pointer
movq initial_code(%rip), %rax
pushq $__KERNEL_CS # set correct cs
pushq %rax # target address in negative space
lretq
1:
ENDPROC(start_cpu)

That shows:

start_cpu+0x14/0x14

Which is more accurate anyway. I'll make a patch.

--
Josh