Re: FSGSBASE ABI considerations
From: Andy Lutomirski
Date: Mon Aug 07 2017 - 12:20:38 EST
On Mon, Aug 7, 2017 at 1:06 AM, Stas Sergeev <stsp@xxxxxxx> wrote:
> Hello.
>
> 31.07.2017 06:05, Andy Lutomirski ÐÐÑÐÑ:
>>
>> - User code can use the new RD/WR FS/GS BASE instructions.
>> Apparently some users really want this for, umm, userspace threading.
>> Think Java.
>
> I wonder how java avoids the lack of the user-space
> continuations support while getting the userspace threading.
> (swapcontext() calls to kernel for sigprocmask())
>
>> The major disadvantage is that user code can use the new instructions.
>> Now userspace is going to do totally stupid shite like writing some
>> nonzero value to GS and then doing WRGSBASE or like linking some
>> idiotic library that uses WRGSBASE into a perfectly innocent program
>> like dosemu2 and resulting in utterly nonsensical descriptor state.
>
> I don't think this can represent the problem, at least not
> for dosemu1/2. dosemu2 does the full context switch via
> a sighandler, dosemu1 uses iret with manually changing
> all registers before jumping to compatibility mode. I don't
> think any state changes done in long mode, can affect the
> state after jump to compatibility mode.
Hmm, right. DOSEMU could get tripped up on the way back to long mode,
and we've discussed this a little bit before, but this is certainly
manageable.
>
>> ----- interaction with modify_ldt() -----
>>
>> The first sticking point we'll hit is modify_ldt() and, in particular,
>> what happens if you call modify_ldt() to change the base of a segment
>> that is ioaded into gs by another thread in the same mm.
>>
>> Our current behavior here is nonsensical: on 32-bit kernels, FS would
>> be fully refreshed on other threads and GS might be depending on
>> compiler options. On 64-bit kernels, neither FS nor GS is immediately
>> refreshed. Historically, we didn't refresh anything reliably. On the
>> bright side, this means that existing modify_ldt() users are (AFAIK)
>> tolerant of somewhat crazy behavior.
>>
>> On an FSGSBASE-enabled system, I think we need to provide
>> deterministic, documented, tested behavior. I can think of three
>> plausible choices:
>>
>> 1a. modify_ldt() immediately updates FSBASE and GSBASE all threads
>> that reference the modified selector.
>>
>> 1b. modify_ldt() immediatley updates FSBASE and GSBASE on all threads
>> that reference the LDT.
>
> Does 1b mean that any call to modify_ldt(), even the
> read call, will reset all bases to the ones of LDT?
Nah, just writes. Doing it this way makes the tracking easier, since
we don't need to keep track of which selectors have been changed.
Note that 1a and 1b are indistinguishable to any user program that
doesn't use WRFSBASE or WRGSBASE, though.
> I think
> this is the half-step. It clearly shows that you don't want
> such state to ever exist, but why not to go a step further
> and just make the bases to be reset not only by any
> unrelated modify_ldt() call, but always on schedule?
> You can state that using wrgsbase on non-zero selector
> is invalid, reset it to LDT state and maybe send a signal
> to the program so that it knows it did something wrong.
> This may sound too rough, but I really don't see how it
> differs from resetting all LDT bases on some unrelated
> modify_ldt() that was done for read, not write.
> Or you may want to reset selector to 0 rather than
> base to LDT.
Windows does something sort of like this (I think), but I don't like
this solution. I fully expect that someone will write a program that
does:
old = rdgsbase();
wrgsbase(new);
call_very_fast_function();
wrgsbase(old);
This will work if GS == 0, which is fine. The problem is that it will
*also* work if GS != 0 with very high probability, especially if this
code sequence is right after some operation that sleeps. And then
we'll get random crashes with very low probability, depending on where
the scheduler hits.
>
>> 2. modify_ldt() leaves FSBASE and GSBASE alone on all threads.
>>
>> (2) is trivial to implement, whereas (1a) and (1b) are a bit nasty to
>> implement when FSGSBASE is on.
>>
>> The tricky bit is that 32-bit kernels can't do (2), so, if we want
>
> But do we have fsgsbase on 32bit kernels at all?
No, and we don't have MSR_FS_BASE, etc either, so the scheduler
basically can't preserve the base across a context switch.
> I think it works only in long mode, no?
> I really tried to google some extensive description
> on this feature, but failed.
>
>> modify_ldt() to behave the same on 32-bit and 64-bit kernels, we're
>> stuck with (1).
>
> If you mean 1a, then to me it looks like a lot of efforts
> for something no one ever needs.
>
>> Thoughts?
>
> I am far from the kernel development so my thoughts
> may be naive, but IMHO you should just disallow this
> by some means (like by doing a fixup on schedule() and
> sending a signal). No one will suffer, people will just
> write 0 to segreg first. Note that such a problem can
> be provoked by the fact that the sighandler does not
> reset the segregs to their default values, and someone
> may simply forget to reset it to 0. You need to remind
> him to do so rather than to invent the tricky code to
> do something theoretically correct.
I would *love* to disallow it. The problem is that I don't believe it
to be possible in a way that doesn't cause more problems than it
solves.
--Andy