Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]

From: Linus Torvalds
Date: Thu Feb 25 2016 - 19:38:18 EST


On Thu, Feb 25, 2016 at 1:32 PM, Jiri Slaby <jslaby@xxxxxxx> wrote:
>
> Interestingly, RBP contains address inside try_to_wake_up --
> ffffffff810a535a (dunno why) which is:
> ffffffff810a5355: e8 66 a0 ff ff callq ffffffff8109f3c0
> <ttwu_stat>
> ffffffff810a535a: e9 9d fe ff ff jmpq ffffffff810a51fc
> <try_to_wake_up+0x3c>
>
> ttwu_stat does in the begginning:
> mov $0x16e80,%r14
>
> which is what we actually still have in r14 when it crashes. The first
> ttwu_stat's "if" has to go through the true branch (otherwise r14 would
> be overwritten).

Hmm. That does sound very much like it might be ttwu_stat() that has
gotten the stack frame wrong, and when finishes exits, it does

popq %rbp
ret

but in fact it popped the return address, and then returned to a crazy address.

Which sounds like a corrupted stack pointer (not a corrupted stack).

Can you make just the "vmlinux" file available somewhere?

In my own private configuration, ttwu_stat() doesn't actually touch
the stack at all - no stack pointer action anywhere except for the

ttwu_stat:
1: call __fentry__
pushq %rbp
..
movq %rsp, %rbp #,

.....

popq %rbp
ret

but yeah, as Peter says, maybe an exception screwed up %rsp somehow..

I really don't see how it would happen here - that code doesn't look
particularly odd.

And the fentry code used by the function tracer can certainly screw
things up, but even that would be hard-pressed to screw up %rbp, since
the saving of rbp comes *after* fentry. Old pre-__fentry__ gcc
versions had a much higher likelihood (the whole mcount thing is a
disaster, but I'm assuming you have a compiler that does __fentry__
and have CC_USING_FENTRY set?)

Linus