Re: [REGRESSION] x86/cpu fsgsbase breaks TLS in 32 bit rr tracees on a 64 bit system
From: Andy Lutomirski
Date: Tue Aug 25 2020 - 12:12:58 EST
> On Aug 24, 2020, at 5:46 PM, Kyle Huey <me@xxxxxxxxxxxx> wrote:
>
> On Mon, Aug 24, 2020 at 5:31 PM Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>>
>>> On Mon, Aug 24, 2020 at 4:52 PM H. Peter Anvin <hpa@xxxxxxxxx> wrote:
>>>
>>> On 2020-08-24 14:10, Andy Lutomirski wrote:
>>>>
>>>> PTRACE_READ_SEGMENT_DESCRIPTOR to read a segment descriptor.
>>>>
>>>> PTRACE_SET_FS / PTRACE_SET_GS: Sets FS or GS and updates the base accordingly.
>>>>
>>>> PTRACE_READ_SEGMENT_BASE: pass in a segment selector, get a base out.
>>>> You would use this to populate the base fields.
>>>>
>>>> or perhaps a ptrace SETREGS variant that tries to preserve the old
>>>> base semantics and magically sets the bases to match the selectors if
>>>> the selectors are nonzero.
>>>>
>>>> Do any of these choices sound preferable to any of you?
>>>>
>>>
>>> My suggestion would be to export the GDT and LDT as a (readonly or mostly
>>> readonly) regset(s) rather than adding entirely new operations. We could allow
>>> the LDT and the per-thread GDT entries to be written, subject to the same
>>> limitations as the corresponding system calls.
>>>
>>
>> That seems useful, although we'd want to do some extensive
>> sanitization of the GDT. But maybe it's obnoxious to ask Kyle and
>> Robert to parse the GDT, LDT, and selector just to emulate the
>> demented pre-5.9 ptrace() behavior.
>>
>> --Andy
>
> We've already addressed the main issue on rr's side[0]. The only
> outstanding issue is that if you record a trace with 32 bit programs
> on a pre-5.9 64 bit kernel and then try to replay it on 5.9 it won't
> work. If you hit this case rr will print an error message telling you
> to boot your 5.9 kernel with nofsgsbase if you want to replay the
> trace. I think that's probably sufficient. 32 bit is legacy stuff we
> don't care that much about anyways, replaying traces on a different
> kernel/machine has always been a bit dicey, and if you absolutely must
> do it there is a workaround. I'm not inclined to do much work to
> support the narrow remaining case.
>
> - Kyle
>
> [0] Namely, we're tracking fs/gsbase for 32 bit tracees on 64 bit
> kernels where the fs/gsbase instructions work in new recordings now:
> https://github.com/mozilla/rr/commit/c3292c75dbd8c9ce5256496108965c0442424eef
I don’t like this at all. Your behavior really shouldn’t depend on
whether the new instructions are available. Also, some day I would
like to change Linux to have the new behavior even if FSGSBASE
instructions are not available, and this will break rr again. (The
current !FSGSBASE behavior is an ugly optimization of dubious value.
I would not go so far as to describe it as correct.)
I would suggest you do one of the following things:
1. Use int $0x80 directly to load 32-bit regs into a child. This
might dramatically simplify your code and should just do the right
thing.
2. Something like your patch but make it unconditional.
3. Ask for, and receive, real kernel support for setting FS and GS in
the way that 32-bit code expects.
Also, for other x86 kernel folks playing along, WTF is
task_user_regset_vew() about? It has eight callers, in two groups,
and as far as I can tell every single call returns &user_x86_64_view
because it's only called in the 64-bit and x32 syscall paths, and
those are only reachable using the 64-bit SYSCALL instruction. I
suppose the exception is if someone ptraces the ptracer and changes CS
at syscall entry. In any case, if task_user_regset_view() ever
returns anything else, the code will malfunction. I'll send a patch
to get rid of it.
--Andy