Re: [REGRESSION] x86/cpu fsgsbase breaks TLS in 32 bit rr tracees on a 64 bit system
From: Andy Lutomirski
Date: Tue Aug 25 2020 - 15:32:21 EST
On Tue, Aug 25, 2020 at 11:50 AM Kyle Huey <me@xxxxxxxxxxxx> wrote:
>
> On Tue, Aug 25, 2020 at 10:31 AM Kyle Huey <me@xxxxxxxxxxxx> wrote:
> >
> > On Tue, Aug 25, 2020 at 9:46 AM Andy Lutomirski <luto@xxxxxxxxxx> wrote:
> > >
> > > On Tue, Aug 25, 2020 at 9:32 AM Kyle Huey <me@xxxxxxxxxxxx> wrote:
> > > >
> > > > On Tue, Aug 25, 2020 at 9:12 AM Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> > > > > I don’t like this at all. Your behavior really shouldn’t depend on
> > > > > whether the new instructions are available. Also, some day I would
> > > > > like to change Linux to have the new behavior even if FSGSBASE
> > > > > instructions are not available, and this will break rr again. (The
> > > > > current !FSGSBASE behavior is an ugly optimization of dubious value.
> > > > > I would not go so far as to describe it as correct.)
> > > >
> > > > Ok.
> > > >
> > > > > I would suggest you do one of the following things:
> > > > >
> > > > > 1. Use int $0x80 directly to load 32-bit regs into a child. This
> > > > > might dramatically simplify your code and should just do the right
> > > > > thing.
> > > >
> > > > I don't know what that means.
> > >
> > > This is untested, but what I mean is:
> > >
> > > static int ptrace32(int req, pid_t pid, int addr, int data) {
> > > int ret;
> > > /* new enough kernels won't clobber r8, etc. */
> > > asm volatile ("int $0x80" : "=a" (ret) : "a" (26 /* ptrace */), "b"
> > > (req), "c" (pid), "d" (addr), "S" (data) : "flags", "r8", "r9", "r10",
> > > "r11");
> > > return ret;
> > > }
> > >
> > > with a handful of caveats:
> > >
> > > - This won't compile with -fPIC, I think. Instead you'll need to
> > > write a little bit of asm to set up and restore ebx yourself. gcc is
> > > silly like this.
> > >
> > > - Note that addr is an int. You'll need to mmap(..., MAP_32BIT, ...)
> > > to get a buffer that can be pointed to with an int.
> > >
> > > The advantage is that this should work on all kernels that support
> > > 32-bit mode at all.
> > >
> > > >
> > > > > 2. Something like your patch but make it unconditional.
> > > > >
> > > > > 3. Ask for, and receive, real kernel support for setting FS and GS in
> > > > > the way that 32-bit code expects.
> > > >
> > > > I think the easiest way forward for us would be a PTRACE_GET/SETREGSET
> > > > like operation that operates on the regsets according to the
> > > > *tracee*'s bitness (rather than the tracer, as it works currently).
> > > > Does that sound workable?
> > > >
> > >
> > > Strictly speaking, on Linux, there is no unified concept of a task's
> > > bitness, so "set all these registers according to the target's
> > > bitness" is not well defined. We could easily give you a
> > > PTRACE_SETREGS_X86_32, etc, though.
> >
> > In the process of responding to this I spent some time doing code
> > inspection and discovered a subtlety in the ptrace API that I was
> > previously unaware of. PTRACE_GET/SETREGS use the regset views
> > corresponding to the tracer but PTRACE_GET/SETREGSET use the regset
> > views corresponding to the tracee. This means it is possible for us
> > today to set FS/GS "the old way" with a 64 bit tracer/32 bit tracee
> > combo, as long as we use PTRACE_SETREGSET with NT_PRSTATUS instead of
> > PTRACE_SETREGS.
>
> Alright I reverted the previous changes and switched us to use
> PTRACE_SETREGSET with NT_PRSTATUS[0] and our 32 bit tests pass again.
> I assume this behavior will remain unchanged indefinitely even when
> the fs/gsbase manipulation instructions are not available since the 32
> bit user_regs_struct can't grow?
Correct.
I think it would be reasonable to add new, less erratic PTRACE_SETxyz
modes, but the old ones will stay as is.
Strictly speaking, the issue you had wasn't a ptrace change. It was a
context switching change. Before the change, you were programming
garbage into the tracee state, but the context switch code wasn't
capable of restoring the garbage correctly and instead happened to
restore the state you actually wanted. With the new code, the context
switch actually worked correctly (for some value of correctly), and
the tracee crashed. Not that this distinction really matters.