Re: [PATCH 3.2 055/152] x86_64, switch_to(): Load TLS descriptors before switching DS and ES

From: Brian Gerst
Date: Thu Feb 26 2015 - 11:28:37 EST


On Thu, Feb 26, 2015 at 10:32 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> On Tue, Feb 24, 2015 at 7:23 PM, Brian Gerst <brgerst@xxxxxxxxx> wrote:
>> On Tue, Feb 24, 2015 at 3:08 PM, Denys Vlasenko
>> <vda.linux@xxxxxxxxxxxxxx> wrote:
>>> On Tue, Feb 24, 2015 at 9:02 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>>>> This currently fails in 32-bit kernels (at least in qemu):
>>>>>
>>>>> / # ./es_test
>>>>> Allocated GDT index 7
>>>>> [FAIL] ES changed from 0x3b to 0x7b
>>>>> [FAIL] ES was corrupted 1000/1000 times
>>>>> / # uname -a
>>>>> Linux (none) 4.0.0-rc1 #1 SMP Tue Feb 24 16:41:58 CET 2015 i686 GNU/Linux
>>>>
>>>> Want to send a patch? I'll get it in a few days if no one beats me.
>>>
>>> I have no patch, sorry (in fact, I failed to find where is the relevant
>>> 32-bit counterpart).
>>>
>>> It's just security people asked me to backport this and I wondered
>>> maybe I should wait a bit on this one, since fix for 32-bit ought
>>> to appear as well.
>>
>> For 32-bit kernel, userspace DS and ES are saved at syscall/interrupt
>> entry time and reloaded on exit, unlike in 64-bit where they are saved
>> and loaded at context switch time. Therefore 32-bit is not affected
>> by the issue this patch addresses.
>>
>> It looks to me though, that the ES test program doesn't actually test
>> what the patch fixes - the segment attributes, like the base address.
>> It tests just the selector, which shouldn't change across a kernel
>> entry (with a few exceptions, like signals). If the test is failing,
>> then it is a different issue from what this patch addresses.
>
> It tests it indirectly. The 64-bit code sets the selector to zero if
> it fails to reload it. Testing the ES base is awkward because it
> can't be done in 64-bit code at all.

I figured out why Denys got the failure. usleep() makes a syscall via
sysenter. The sysenter path saves es/ds, but does not restore them
before sysexit like the int80/iret path would. That leaves them as
USER_DS that the kernel loaded for itself. I believe this was an
intentional optimization, assuming the vdso would only be called from
programs conforming to the ELF ABI.

--
Brian Gerst
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/