Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]

From: Jiri Slaby
Date: Fri Feb 26 2016 - 03:56:25 EST

Next message: Philipp Zabel: "Re: [PATCH v10 0/5] MT8173 IOMMU SUPPORT"
Previous message: Luca Abeni: "sched/core: fix __sched_setscheduler() to properly invoke prio_changed_dl()"
In reply to: Linus Torvalds: "Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]"
Next in thread: Jiri Slaby: "Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 02/26/2016, 01:38 AM, Linus Torvalds wrote:
> On Thu, Feb 25, 2016 at 1:32 PM, Jiri Slaby <jslaby@xxxxxxx> wrote:
>>
>> Interestingly, RBP contains address inside try_to_wake_up --
>> ffffffff810a535a (dunno why) which is:
>> ffffffff810a5355: e8 66 a0 ff ff callq ffffffff8109f3c0
>> <ttwu_stat>
>> ffffffff810a535a: e9 9d fe ff ff jmpq ffffffff810a51fc
>> <try_to_wake_up+0x3c>
>>
>> ttwu_stat does in the begginning:
>> mov $0x16e80,%r14
>>
>> which is what we actually still have in r14 when it crashes. The first
>> ttwu_stat's "if" has to go through the true branch (otherwise r14 would
>> be overwritten).
>
> Hmm. That does sound very much like it might be ttwu_stat() that has
> gotten the stack frame wrong, and when finishes exits, it does
>
> popq %rbp
> ret
>
> but in fact it popped the return address, and then returned to a crazy address.
>
> Which sounds like a corrupted stack pointer (not a corrupted stack).
>
> Can you make just the "vmlinux" file available somewhere?

Sure, both vmlinux w/ its separated .debuginfo sections vmlinux.debug
are at:
http://labs.suse.cz/jslaby/bug-968218/

There is also core.s which is a result of:
objdump -d vmlinux-4.4.2-3-default | grep -A 10000 '<update_rq_clock>:'
>core.s

> In my own private configuration, ttwu_stat() doesn't actually touch
> the stack at all - no stack pointer action anywhere except for the
>
> ttwu_stat:
> 1: call __fentry__
> pushq %rbp
> ..
> movq %rsp, %rbp #,
>
> .....
>
> popq %rbp
> ret
>
> but yeah, as Peter says, maybe an exception screwed up %rsp somehow..

Lucky you. My ttwu_stat does a bit more stack save-restoring. But all
seem to be paired:

ffffffff8109f3c0 <ttwu_stat>:
ffffffff8109f3c0: e8 fb ca 60 00 callq ffffffff816abec0
<__fentry__>
ffffffff8109f3c5: 55 push %rbp
ffffffff8109f3c6: 48 89 e5 mov %rsp,%rbp
ffffffff8109f3c9: 41 57 push %r15
ffffffff8109f3cb: 41 56 push %r14
ffffffff8109f3cd: 41 55 push %r13
ffffffff8109f3cf: 41 54 push %r12
ffffffff8109f3d1: 49 c7 c6 80 6e 01 00 mov $0x16e80,%r14
ffffffff8109f3d8: 53 push %rbx
...
ffffffff8109f48c: 5b pop %rbx
ffffffff8109f48d: 41 5c pop %r12
ffffffff8109f48f: 41 5d pop %r13
ffffffff8109f491: 41 5e pop %r14
ffffffff8109f493: 41 5f pop %r15
ffffffff8109f495: 5d pop %rbp
ffffffff8109f496: c3 retq

> I really don't see how it would happen here - that code doesn't look
> particularly odd.
>
> And the fentry code used by the function tracer can certainly screw
> things up, but even that would be hard-pressed to screw up %rbp, since
> the saving of rbp comes *after* fentry. Old pre-__fentry__ gcc
> versions had a much higher likelihood (the whole mcount thing is a
> disaster, but I'm assuming you have a compiler that does __fentry__
> and have CC_USING_FENTRY set?)

Yep, -mfentry in use obviously from the dump above, it is compiled by
gcc 5.3.1 rev231346.

thanks,
--
js
suse labs

Next message: Philipp Zabel: "Re: [PATCH v10 0/5] MT8173 IOMMU SUPPORT"
Previous message: Luca Abeni: "sched/core: fix __sched_setscheduler() to properly invoke prio_changed_dl()"
In reply to: Linus Torvalds: "Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]"
Next in thread: Jiri Slaby: "Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]