Re: [BUG/PATCH] kernel RNG and its secrets

From: Stephan Mueller
Date: Fri Apr 10 2015 - 09:35:05 EST


Am Mittwoch, 18. März 2015, 12:09:45 schrieb Stephan Mueller:

Hi,

>Am Mittwoch, 18. März 2015, 11:56:43 schrieb Daniel Borkmann:
>
>Hi Daniel,
>
>>On 03/18/2015 11:50 AM, Hannes Frederic Sowa wrote:
>>> On Wed, Mar 18, 2015, at 10:53, mancha wrote:
>>>> Hi.
>>>>
>>>> The kernel RNG introduced memzero_explicit in d4c5efdb9777 to
>>>> protect
>>>>
>>>> memory cleansing against things like dead store optimization:
>>>> void memzero_explicit(void *s, size_t count)
>>>> {
>>>>
>>>> memset(s, 0, count);
>>>> OPTIMIZER_HIDE_VAR(s);
>>>>
>>>> }
>>>>
>>>> OPTIMIZER_HIDE_VAR, introduced in fe8c8a126806 to protect
>>>> crypto_memneq>>
>>>>
>>>> against timing analysis, is defined when using gcc as:
>>>> #define OPTIMIZER_HIDE_VAR(var) __asm__ ("" : "=r" (var) : "0"
>>>> (var))
>>>>
>>>> My tests with gcc 4.8.2 on x86 find it insufficient to prevent gcc
>>>> from optimizing out memset (i.e. secrets remain in memory).
>>>>
>>>> Two things that do work:
>>>> __asm__ __volatile__ ("" : "=r" (var) : "0" (var))
>>>
>>> You are correct, volatile signature should be added to
>>> OPTIMIZER_HIDE_VAR. Because we use an output variable "=r", gcc is
>>> allowed to check if it is needed and may remove the asm statement.
>>> Another option would be to just use var as an input variable - asm
>>> blocks without output variables are always considered being volatile
>>> by gcc.
>>>
>>> Can you send a patch?
>>>
>>> I don't think it is security critical, as Daniel pointed out, the
>>> call
>>> will happen because the function is an external call to the crypto
>>> functions, thus the compiler has to flush memory on return.
>>
>>Just had a look.
>>
>>$ gdb vmlinux
>>(gdb) disassemble memzero_explicit
>>
>>Dump of assembler code for function memzero_explicit:
>> 0xffffffff813a18b0 <+0>: push %rbp
>> 0xffffffff813a18b1 <+1>: mov %rsi,%rdx
>> 0xffffffff813a18b4 <+4>: xor %esi,%esi
>> 0xffffffff813a18b6 <+6>: mov %rsp,%rbp
>> 0xffffffff813a18b9 <+9>: callq 0xffffffff813a7120 <memset>
>> 0xffffffff813a18be <+14>: pop %rbp
>> 0xffffffff813a18bf <+15>: retq
>>
>>End of assembler dump.
>>
>>(gdb) disassemble extract_entropy
>>[...]
>>
>> 0xffffffff814a5000 <+304>: sub %r15,%rbx
>> 0xffffffff814a5003 <+307>: jne 0xffffffff814a4f80
>>
>><extract_entropy+176> 0xffffffff814a5009 <+313>: mov %r12,%rdi
>>
>> 0xffffffff814a500c <+316>: mov $0xa,%esi
>> 0xffffffff814a5011 <+321>: callq 0xffffffff813a18b0
>>
>><memzero_explicit> 0xffffffff814a5016 <+326>: mov -0x48(%rbp),%rax
>>[...]
>>
>>I would be fine with __volatile__.
>
>Are we sure that simply adding a __volatile__ works in any case? I just
>did a test with a simple user space app:
>
>static inline void memset_secure(void *s, int c, size_t n)
>{
> memset(s, c, n);
> //__asm__ __volatile__("": : :"memory");
> __asm__ __volatile__("" : "=r" (s) : "0" (s));
>}
>
>int main(int argc, char *argv[])
>{
>#define BUFLEN 20
> char buf[BUFLEN];
>
> snprintf(buf, (BUFLEN - 1), "teststring\n");
> printf("%s", buf);
>
> memset_secure(buf, 0, BUFLEN);
>}
>
>When using the discussed code of __asm__ __volatile__("" : "=r" (s) :
>"0" (s)); I do not find the code implementing memset(0) in objdump.
>Only when I enable the memory barrier, I see the following (when
>compiling with -O2):
>
>objdump -d memset_secure:
>...
>0000000000400440 <main>:
>...
> 400469: 48 c7 04 24 00 00 00 movq $0x0,(%rsp)
> 400470: 00
> 400471: 48 c7 44 24 08 00 00 movq $0x0,0x8(%rsp)
> 400478: 00 00
> 40047a: c7 44 24 10 00 00 00 movl $0x0,0x10(%rsp)
> 400481: 00
>...

I would like to bring up that topic again as I did some more analyses:

For testing I used the following code:

static inline void memset_secure(void *s, int c, size_t n)
{
memset(s, c, n);
BARRIER
}

where BARRIER is defined as:

(1) __asm__ __volatile__("" : "=r" (s) : "0" (s));

(2) __asm__ __volatile__("": : :"memory");

(3) __asm__ __volatile__("" : "=r" (s) : "0" (s) : "memory");

I tested the code with gcc and clang, considering that there is effort
underway to compile the kernel with clang too.

The following table marks an X when the aforementioned movq/movl code is
present (or an invocation of memset@plt) in the object code (i.e. the code we
want). Contrary the table marks - where the code is not present (i.e. the code
we do not want):

| BARRIER | (1) | (2) | (3)
---------+----------+ | |
Compiler | | | |
=========+==========+==================
| | |
gcc -O0 | X | X | X
| | |
gcc -O2 | - | X | X
| | |
gcc -O3 | - | X | X
| | |
clang -00 | X | X | X
| | |
clang -02 | X | - | X
| | |
clang -03 | - | - | X

As the kernel is compiled with -O2, clang folks would still be left uncovered
with the current solution (i.e. BARRIER option (2)).

Thus, may I propose to update the patch to use option (3) instead as (i) it
does not cost anything extra on gcc and (ii) it covers clang too?

Ciao
Stephan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/