Re: [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite

From: David Miller
Date: Sun May 29 2016 - 00:24:36 EST

Next message: Gwendal Grignou: "Re: [PATCH 3/4] doc: dt: pwm: add binding for ChromeOS EC PWM"
Previous message: David S. Miller: "[PATCH 2/2] sparc64: Fix return from trap window fill crashes."
In reply to: David Miller: "Re: [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite"
Next in thread: Sam Ravnborg: "Re: [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: David Miller <davem@xxxxxxxxxxxxx>
Date: Fri, 27 May 2016 15:51:37 -0700 (PDT)

> I'm trying to figure out the same thing myself.

Ok, mystery solved.

The basic problem is that we don't handle unaligned stacks on return to userspace
%100 properly. We would also no handle a stack whose access would trigger a bus
error or similar.

A little background:

The window trap handlers are slightly clever, the trap table entries for them are
composed of two pieces of code. First comes the code that actually performs
the window fill or spill trap handling, and then there are three instructions at
the end which are for exception processing.

The userland register window fill handler is:

#define FILL_1_GENERIC(ASI) \
add %sp, STACK_BIAS + 0x00, %g1; \
ldxa [%g1 + %g0] ASI, %l0; \
mov 0x08, %g2; \
mov 0x10, %g3; \
ldxa [%g1 + %g2] ASI, %l1; \
mov 0x18, %g5; \
ldxa [%g1 + %g3] ASI, %l2; \
ldxa [%g1 + %g5] ASI, %l3; \
add %g1, 0x20, %g1; \
ldxa [%g1 + %g0] ASI, %l4; \
ldxa [%g1 + %g2] ASI, %l5; \
ldxa [%g1 + %g3] ASI, %l6; \
ldxa [%g1 + %g5] ASI, %l7; \
add %g1, 0x20, %g1; \
ldxa [%g1 + %g0] ASI, %i0; \
ldxa [%g1 + %g2] ASI, %i1; \
ldxa [%g1 + %g3] ASI, %i2; \
ldxa [%g1 + %g5] ASI, %i3; \
add %g1, 0x20, %g1; \
ldxa [%g1 + %g0] ASI, %i4; \
ldxa [%g1 + %g2] ASI, %i5; \
ldxa [%g1 + %g3] ASI, %i6; \
ldxa [%g1 + %g5] ASI, %i7; \
restored; \
retry; nop; nop; nop; nop; \
b,a,pt %xcc, fill_fixup_dax; \
b,a,pt %xcc, fill_fixup_mna; \
b,a,pt %xcc, fill_fixup;

And the way this works is that if any of those loads or stores
generate an exception, the exception handler can revector to one of
those final three branch instructions depending upon which kind of
exception the memory access took.

For example, for a regular fault, the code goes:

winfix_trampoline:
rdpr %tpc, %g3 ! Prepare winfixup TNPC
or %g3, 0x7c, %g3 ! Compute branch offset
wrpr %g3, %tnpc ! Write it into TNPC
done ! Trap return

All window trap handlers are 0x80 aligned, so if we "or" 0x7c into the
trap time program counter, we'll get that final instruction in the
trap handler.

On return from trap, we have to pull the register window in but we do
this by hand instead of just executing a "restore" instruction for
several reasons. The largest being that from Niagara and onward we
simply don't have enough levels in the trap stack to fully resolve all
possible exception cases of a window fault when we are already at
trap level 1 (which we enter to get ready to return from the original
trap).

This is executed inline via the FILL_*_RTRAP handlers. rtrap_64.S's
code branches directly to these to do the window fill by hand if
necessary. Now if you look at them, we'll see at the end:

ba,a,pt %xcc, user_rtt_fill_fixup; \
ba,a,pt %xcc, user_rtt_fill_fixup; \
ba,a,pt %xcc, user_rtt_fill_fixup;

oops all three cases are handled like a fault.

This doesn't work because each of these trap types (data access
exception, memory address unaligned, and faults) store their auxiliary
info in different registers to pass on to the C handler which does the
real work.

So in the case where the stack was unaligned, the unaligned trap
handler setup the arg registers one way, and then we branched to
the fault handler.

So the FAULT_TYPE_* value was basically garbage, and randomly would
generate the backtrace that you saw.

This is the fix I am testing right now:

====================

Next message: Gwendal Grignou: "Re: [PATCH 3/4] doc: dt: pwm: add binding for ChromeOS EC PWM"
Previous message: David S. Miller: "[PATCH 2/2] sparc64: Fix return from trap window fill crashes."
In reply to: David Miller: "Re: [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite"
Next in thread: Sam Ravnborg: "Re: [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]