Re: [PATCH v3 1/7] x86/ldt: refresh %fs and %gs in refresh_ldt_segments()

From: Andy Lutomirski
Date: Thu Jun 28 2018 - 16:33:54 EST


On Wed, Jun 27, 2018 at 11:22 AM, <hpa@xxxxxxxxx> wrote:
> On June 27, 2018 11:19:12 AM PDT, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>>On Fri, Jun 22, 2018 at 11:47 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx>
>>wrote:
>>>
>>>
>>>
>>>> On Jun 22, 2018, at 11:29 AM, H. Peter Anvin
>><h.peter.anvin@xxxxxxxxx> wrote:
>>>>
>>>>> On 06/22/18 07:24, Andy Lutomirski wrote:
>>>>>
>>>>> That RPL3 part is false. The following program does:
>>>>>
>>>>> #include <stdio.h>
>>>>>
>>>>> int main()
>>>>> {
>>>>> unsigned short sel;
>>>>> asm volatile ("mov %%ss, %0" : "=rm" (sel));
>>>>> sel &= ~3;
>>>>> printf("Will write 0x%hx to GS\n", sel);
>>>>> asm volatile ("mov %0, %%gs" :: "rm" (sel & ~3));
>>>>> asm volatile ("mov %%gs, %0" : "=rm" (sel));
>>>>> printf("GS = 0x%hx\n", sel);
>>>>> return 0;
>>>>> }
>>>>>
>>>>> prints:
>>>>>
>>>>> Will write 0x28 to GS
>>>>> GS = 0x28
>>>>>
>>>>> The x86 architecture is *insane*.
>>>>>
>>>>> Other than that, this patch seems generally sensible. But my
>>>>> objection that it's incorrect with FSGSBASE enabled for %fs and %gs
>>>>> still applies.
>>>>>
>>>>
>>>> Ugh, you're right... I misremembered. The CPL simply overrides the
>>RPL
>>>> rather than trapping.
>>>>
>>>> We still need to give legacy applications which have zero idea about
>>the
>>>> separate bases that apply only to 64-bit mode a way to DTRT.
>>Requiring
>>>> these old crufty applications to do something new is not an option.
>>>
>>>>
>>>> As ugly as it is, I'm thinking the Right Thing is to simply make it
>>a
>>>> part of the Linux ABI that if the FS or GS selector registers point
>>into
>>>> the LDT then we will requalify them; if a 64-bit app does that then
>>they
>>>> get that behavior. This isn't something that will happen
>>>> asynchronously, and if a 64-bit process loads an LDT value into FS
>>or
>>>> GS, they are considered to have opted in to that behavior.
>>>
>>> But the old and crusty apps donât depend on requalification because
>>we never used to do it.
>>>
>>> Iâm not convinced we ever need to refresh the base. In fact, we could
>>start preserving the base of LDT-referencing FS/GS across context
>>switches even without FSGSBASE at some minor performance cost, but I
>>donât really see the point. I still think my proposed semantics are
>>easy to implement and preserve the ABI even if they have the sad
>>property that the FSGSBASE behavior and the non-FSGSBASE behavior end
>>up different.
>>>
>>
>>There's another reasonable solution: do exactly what your patch does,
>>minus the bugs. We would need to get the RPL != 3 case right (easy)
>>and the case where there's a non-running thread using the selector in
>>question. The latter is probably best handled by adding a flag to
>>thread_struct that says "fsbase needs reloading from the descriptor
>>table" and only applies if the selector is in the LDT or TLS area. Or
>>we could hijack a high bit in the selector. Then we'd need to update
>>everything that uses the fields.
>
> Obviously fix the bugs.
>
> How would you control this bit?

Sorry, I was wrong in my previous email. Let me try this again:

Notwithstanding the RPL thing, the reason I don't like your patch as
is, and the reason I didn't write a similar patch myself, is that it
will behave nondeterministically on an FSGSBASE kernel. Suppose there
are two threads, A and B, that share an mm. A has %fs == 0x7 and
FSBASE = 0. The LDT has the base for entry 0 set to 0.

Now thread B calls modify_ldt to change the base for entry 0 to 1.

The Obviously Sane (tm) behavior is for task A's FSBASE to
asynchronously change to 1. This is the only deterministic behavior
that is even possible on a 32-bit kernel, and it's the only
not-totally-nutty behavior that is possible on a 64-bit non-FSGSBASE
kernel, and it's still perfectly reasonable for FSGSBASE. The problem
is that it's not so easly to implement.

With your patch, on an FSGSBASE kernel, we get the desired behavior if
thread A is running while thread B calls modify_ldt(). But we get
different behavior if thread A is stopped -- thread A's FSBASE will
remain set to 0.

With that in mind, my email was otherwise garbage, and the magic "bit"
idea was total crap.

I can see three vaguely credible ways to implement this.

1. thread B walks all threads on the system, notices that thread A has
the same mm, and asynchronously fixes it up. The locking is a bit
tricky, and the performance isn't exactly great. Maybe that's okay.

2. We finally add an efficient way to find all threads that share an
mm and do (1) but faster.

3. We add enough bookkeeping to thread_struct so that, the next time
thread A runs or has ptrace try to read its FSBASE, we notice that
FSBASE is stale and fix it up.

(3) will perform the best, but the implementation is probably nasty.
If we want modify_ldt() to only reset the base for the modified
records, we probably need a version number for each of the 8192
possible LDT entries stored in ldt_struct (which will double its size,
but so what?). Then we need thread_struct to store the version number
of the LDT entries that fsindex and gsindex refer to. Now we make
sure that every code path that reads fsbase or gsbase first calls some
revalidate_fs_and_gs() function that will reset the bases and maybe
even the selectors if needed. Getting the locking right on that last
bit is possibly a bit tricky, since we may need the LDT lock to be
held across the revalidation *and* whatever subsequent code actually
reads the values.

I think that (3) is the nicest solution, but it would need to be implemeted.

What do you think?