Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
From: Kees Cook
Date: Sat Nov 11 2017 - 11:14:17 EST
On Fri, Nov 10, 2017 at 6:36 PM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> [ Bringing in the gcc plugin people and the kernel hardening list,
> since it now is no longer even remotely looking like a nfsd, vfs or
> filesystem issue any more ]
>
> Kees, Emese,
> the whole thread is on lkml, but there's clearly something horribly
> wrong with RANDSTRUCT, and it's not new even though it looked that way
> for a while.
It wouldn't be the first issue we've seen; it's (obviously) a pretty
aggressive change to the resulting build.
> Patrick seems to trigger it with nfsd, so it might be specific to that.
>
> Alternatively, it might just be that very few people run
> RANDSTRUCT-built kernels, or just have been lucky with the seeding.
Given its potential cache-line abuse, I'm not surprised that its usage
is more limited than other features.
> Sorry for top-posting, but there's not really anything in the email
> itself to reply to, other than saying thanks to Patrick for narrowing
> it down like this.
Agreed; thanks Patrick! :) Given that the issue is non-deterministic,
I wonder if the bug is related to some kind of missing RCU or barrier
that goes unnoticed in normal struct layouts.
> It would have been very interesting if it had actually bisected to
> something, but it seems that the real issue is just the choice of
> seeding for RANDSTRUCT.
That's where we've seen bugs in the past: some pathological ordering
of a struct uncovers a corner case. In the past it's been much more
deterministic: doesn't build, or immediately crashes on boot, etc.
I'll take a closer look at this and see if I can provide something to
narrow it down.
-Kees
>
> Linus
>
> On Fri, Nov 10, 2017 at 4:27 PM, Patrick McLean <chutzpah@xxxxxxxxxx> wrote:
>> On 2017-11-10 03:26 PM, Patrick McLean wrote:
>>> On 2017-11-10 10:42 AM, Linus Torvalds wrote:
>>>>
>>>> I really don't see anything that looks even half-way suspicious in
>>>> that 4.13.8..11 range. But as mentioned, compiler interactions can be
>>>> _really_ subtle.
>>>>
>>>> And hey, it can be a real kernel bug too, that just happens to be
>>>> exposed by RANDSTRUCT, so a bisect really would be very nice.
>>>
>>> I am working on bisecting the issue now, but I think I have some more
>>> evidence pointing to a compiler issue related to RANDSTRUCT. There are
>>> actually 3 issues that we have seen. Sometimes we get the null pointer
>>> deref in the initial message, sometimes we get the GPF, and sometimes we
>>> see an issue where the NFS clients see all files as root-owned
>>> directories. Any given kernel will always see the same issue, but after
>>> a "make mrproper" and recompile (with the same .config), the issue will
>>> often change. I suspect that all 3 of these problems are actually the
>>> same issue manifesting itself in different ways depending on what seed
>>> the RANDSTRUCT gcc plugin is using.
>>
>> Further update on this, using the same seed for RANDSTRUCT, I have
>> reproduced this issue on v4.13.0, so it does not seem to be recently
>> introduced. The older kernel apparently only worked for us because we
>> were lucky. Generally we always compile new kernels from a fresh tree,
>> so they are never using the same seed.
>>
>> In case someone wants to play with this, here are some interesting seeds
>> (in include/generated/randomize_layout_hash.h):
>>
>> Produce a NULL pointer dereference (though I am not sure what the client
>> does to produce this).
>> 5970d6494d0f4236ec57147a46e700f4f501536236d96f6f68ea223e06a258bc
>>
>> All files for nfsd4 clients appear as directories owned as root, no
>> matter the real owner (this happens for all clients we have tested):
>> 3f158cd1014800ce5eb6c1f532ac64f2357fdb9a684096557d2fbb1d281f325e
>>
>> This is the seed that was breaking motherboards (make sure you have a
>> way to flash the BIOS with this one):
>> 3e32f2d1b4a3dde9f2fd95151386cd1d5bd6167597a0b868f6273aabfc5712dd
>>
>> Finally, here is a seed that produces a kernel that does not exhibit any
>> problems we are aware of:
>> e8698c12137fcd1dcbff6d1ed97e5d766128447a27ce9f9d61e0cb8c05ad4d3b
>>
>>>>
>>>> Because in the end, compiler bugs are very rare. They are particularly
>>>> annoying when they do happen, though, so they loom big in the mind of
>>>> people who have had to chase them down.
>>>>
--
Kees Cook
Pixel Security