Re: ppc64le reliable stack unwinder and scheduled tasks
From: Balbir Singh
Date: Fri Jan 11 2019 - 20:09:33 EST
On Thu, Jan 10, 2019 at 04:14:00PM -0500, Joe Lawrence wrote:
> Hi all,
>
> tl;dr: On ppc64le, what is top-most stack frame for scheduled tasks
> about?
>
> I am looking at a bug in which ~10% of livepatch tests on RHEL-7 and
> RHEL-8 distro kernels, the ppc64le reliable stack unwinder consistently
> reports an unreliable stack for a given task. This condition can
> eventually resolve itself, but sometimes this state remains wedged for
> hours and I can live-debug the system with crash-utility.
>
In summary, you think the reliable stack tracer is being conservative?
xmon (built in debugger) also allows for live-debugging and might
be more easier to get some of this done.
> I have tried reproducing with upstream 4.20 kernel, but haven't seen
> this exact condition yet. More on upstream in a bit.
>
> That said, I have a question about the ppc64 stack frame layout with
> regard to scheduled tasks. In each sticky "unreliable" stack trace
> instance that I've looked at, the task's stack has an interesting
> frame at the top:
>
Could you please define interesting frame on top a bit more? Usually
the topmost return address is in LR
>
> Example 1 (RHEL-7)
> ==================
>
> crash> struct task_struct.thread c00000022fd015c0 | grep ksp
> ksp = 0xc0000000288af9c0
>
> crash> rd 0xc0000000288af9c0 -e 0xc0000000288b0000
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[0]:
>
> c0000000288af9c0: c0000000288afb90 0000000000dd0000 ...(............
> c0000000288af9d0: c000000000002a94 c000000001c60a00 .*..............
>
> crash> sym c000000000002a94
> c000000000002a94 (T) hardware_interrupt_common+0x114
What does bt look like in this case? what does r1 point to? I would
look at xmon and see what the "t" (stack trace) looks like, I think
it's a more familiar tool.
>
> c0000000288af9e0: c000000001c60a80 0000000000000000 ................
> c0000000288af9f0: c0000000288afbc0 0000000000dd0000 ...(............
> c0000000288afa00: c0000000014322e0 c000000001c60a00 ."C.............
> c0000000288afa10: c0000002303ae380 c0000002303ae380 ..:0......:0....
> c0000000288afa20: 7265677368657265 0000000000002200 erehsger."......
>
> Uh-oh...
>
> /* Mark stacktraces with exception frames as unreliable. */
> stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER
>
> c0000000288afa30: c0000000001240ac c0000000288afcb0 .@.........(....
> c0000000288afa40: c0000000013e4d00 0000000000000000 .M>.............
> c0000000288afa50: 0000000000000001 0000000000000001 ................
> c0000000288afa60: c0000000014322e0 0000000000000804 ."C.............
> c0000000288afa70: c0000000288ac080 c00000022fd015c0 ...(......./....
> c0000000288afa80: c0000000288afae0 00000000ffffffff ...(............
> c0000000288afa90: c0000000288afae0 c000000007b80900 ...(............
> c0000000288afaa0: c000000000e90a00 c000000000e90a00 ................
> c0000000288afab0: c0000000001240ac c000000000e90a00 .@..............
> c0000000288afac0: c000000000e90a00 c000000004790580 ..........y.....
> c0000000288afad0: 0000000000000001 c000000000e90a00 ................
> c0000000288afae0: 0000000000000000 0000000000000004 ................
> c0000000288afaf0: c00000022fd01ad0 c000000000e73be0 .../.....;......
> c0000000288afb00: c00000023900f450 c000000001488190 P..9......H.....
> c0000000288afb10: 0000000000ad0000 c00000023900ef40 ........@..9....
> c0000000288afb20: c0000000288ac000 c000000000e73be0 ...(.....;......
> c0000000288afb30: c00000000001b514 c00000022fd01628 ........(../....
> c0000000288afb40: c0000000288afbb0 c000000000008800 ...(............
> c0000000288afb50: c000000000162880 0000000000000000 .(..............
> c0000000288afb60: 00000000240f3022 0000000000000004 "0.$............
> c0000000288afb70: c000000000e90a00 c00000022fd01a20 ........ ../....
> c0000000288afb80: c0000000288afbf0 c0000000014322e0 ...(....."C.....
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[1]:
>
> c0000000288afb90: c0000000288afbf0 c000000001960a00 ...(............
> c0000000288afba0: c00000000001b514 0000000000000004 ................
>
> crash> sym c00000000001b514
> c00000000001b514 (T) __switch_to+0x264
>
> c0000000288afbb0: c000000000e90a00 0000000000000000 ................
> c0000000288afbc0: c0000000288ac000 c0000000014322e0 ...(....."C.....
> c0000000288afbd0: c000000000e90a00 c000000001960a00 ................
> c0000000288afbe0: c00000022fd015c0 c00000023900ef40 .../....@..9....
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[2]:
>
> c0000000288afbf0: c0000000288afcd0 c000000000003300 ...(.....3......
> c0000000288afc00: c000000000a83918 c0000000013e4d00 .9.......M>.....
>
> crash> sym c000000000a83918
> c000000000a83918 (t) __schedule+0x428
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[3]:
>
> c0000000288afcd0: c0000000288afd80 0000000000003300 ...(.....3......
> c0000000288afce0: c0000000001240ac 0000000000000000 .@..............
>
> crash> sym c0000000001240ac
> c0000000001240ac (t) rescuer_thread+0x3ac
>
> [ ... and so on as we would normally expect ... ]
>
>
> Example 2 - (RHEL-7)
> ====================
>
> crash> struct task_struct.thread c00000004a660a00 | grep ksp
> ksp = 0xc0000005e85439c0,
>
> crash> rd 0xc0000005e8543b90 -e 0xc0000005e8544000
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[0]:
> c0000005e85439c0: c0000005e8543b90 0000000000000000 .;T.............
> c0000005e85439d0: c000000000002a94 0000000000000000 .*..............
>
> crash> sym c000000000002a94
> c000000000002a94 (T) hardware_interrupt_common+0x114
>
> c0000005e8543a20: 7265677368657265 c0000000013e4d00 erehsger.M>.....
>
> /* Mark stacktraces with exception frames as unreliable. */
> stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[1]:
> c0000005e8543b90: c0000005e8543bf0 c000000001960a00 .;T.............
> c0000005e8543ba0: c00000000001b514 0000000000000004 ................
>
> crash> sym c00000000001b514
> c00000000001b514 (T) __switch_to+0x264
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[2]:
> c0000005e8543bf0: c0000005e8543cd0 c000000000003300 .<T......3......
> c0000005e8543c00: c000000000a83918 c0000000013e4d00 .9.......M>.....
>
> crash> sym c000000000a83918
> c000000000a83918 (t) __schedule+0x428
>
> [ ... and so on as we would normally expect ... ]
>
>
> Example 3 (upstream 4.20)
> =========================
>
> This is not a repro of the sticky "unreliable" stack trace, but just the
> stack of a workqueue that fired and called msleep.
>
>
> crash> struct task_struct.thread c0000002257f0e00 | grep ksp
> ksp = 0xc00000022715f9b0,
>
> crash> rd 0xc00000022715f9b0 100
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[0]:
>
> c00000022715f9b0: c00000022715fb90 c00000023f4e8280 ...'......N?....
> c00000022715f9c0: c00000022715f9e0 0000000000000003 ...'............
>
> crash> sym c00000022715f9e0
> c00000022715f9e0 (B) _end+0x22594f9e0
That is weird, can you compare this to the output of "bt" or "t"
from xmon?
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[1]:
>
> c00000022715fb90: c00000022715fbf0 c00000000158634c ...'....LcX.....
> c00000022715fba0: c00000000001ebfc c000000001318a60 ........`.1.....
>
> crash> sym c00000000001ebfc
> c00000000001ebfc (T) __switch_to+0x2dc
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[2]:
>
> c00000022715fbf0: c00000022715fcd0 c000000000008800 ...'............
> c00000022715fc00: c000000000abcf40 c0000000013abf00 @.........:.....
>
> crash> sym c000000000abcf40
> c000000000abcf40 (t) __schedule+0x3b0
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[3]:
>
> c00000022715fcd0: c00000022715fd00 2c50fff000000000 ...'..........P,
> c00000022715fce0: c000000000abd628 c00000023bff57e8 (........W.;....
>
> crash> sym c000000000abd628
> c000000000abd628 (T) schedule+0x48
>
> [ ... etc ... ]
>
>
> save_stack_trace_tsk_reliable
> =============================
>
> arch/powerpc/kernel/stacktrace.c :: save_stack_trace_tsk_reliable() does
> take into account the first stackframe, but only to verify that the link
> register is indeed pointing at kernel code address.
>
> Can someone explain what __switch_to is doing with the stack and whether
> in such circumstances, the reliable stack unwinder should be skipping
> the first frame when verifying the frame contents like STACK_FRAME_MARKER,
> etc.
>
> I may be on the wrong path in debugging this, but figuring out this
> sp[0] frame state would be helpful.
>
I would compare the output of xmon across the unreliable stack frames with
the contents of what the stack unwinder has.
I suspect the patch is stuck trying to transition to enabled state, it'll
be interesting to see if we are really stuck
Balbir Singh.