Re: [BUG] Boot crash on v6.7-rc2

From: Kees Cook
Date: Sat Nov 25 2023 - 13:38:08 EST




On November 25, 2023 9:54:28 AM PST, "Gustavo A. R. Silva" <gustavo@xxxxxxxxxxxxxx> wrote:
>
>
>On 11/24/23 09:28, Gustavo A. R. Silva wrote:
>>
>>
>> On 11/24/23 04:24, Joey Gouly wrote:
>>> Hi all,
>>>
>>> I just hit a boot crash on v6.7-rc2 (arm64, FVP model):
>>
>> [..]
>>
>>> Checking `struct neighbour`:
>>>
>>>     struct neighbour {
>>>         struct neighbour __rcu    *next;
>>>         struct neigh_table    *tbl;
>>>     .. fields ..
>>>         u8            primary_key[0];
>>>     } __randomize_layout;
>>>
>>> Due to the `__randomize_layout`, `primary_key` field is being placed before `tbl` (actually it's the same address since it's a 0 length array). That means the memcpy() corrupts the tbl pointer.
>>>
>>> I think I just got unlucky with my CONFIG_RANDSTRUCT seed (I can provide it if needed), it doesn't look as if it's a new issue.
>>
>> It seems the issue is caused by this change that was recently added to -rc2:
>>
>> commit 1ee60356c2dc ("gcc-plugins: randstruct: Only warn about true flexible arrays")
>>
>> Previously, one-element and zero-length arrays were treated as true flexible arrays
>> (however, they are "fake" flex arrays), and __randomize_layout would leave them
>> untouched at the end of the struct; the same for proper C99 flex-array members. But
>> after the commit above, that's no longer the case: Only C99 flex-array members will
>> behave correctly (remaining untouched at end of the struct), and the other two types
>> of arrays will be randomized.
>
>mmh... it seems that commit 1ee60356c2dc only prevents one-element arrays from being
>treated as flex arrays, while the code should still keep zero-length arrays untouched:
>
> if (typesize == NULL_TREE && TYPE_DOMAIN(fieldtype) != NULL_TREE &&
> TYPE_MAX_VALUE(TYPE_DOMAIN(fieldtype)) == NULL_TREE)
> return true;
>
>- if (typesize != NULL_TREE &&
>- (TREE_CONSTANT(typesize) && (!tree_to_uhwi(typesize) ||
>- tree_to_uhwi(typesize) == tree_to_uhwi(elemsize))))
>- return true;
>-

This should be both the 0 and 1 checks. I think the original fix is correct: switch to a true flex array.

>
>Sorry about the confusion.
>
>>
>>>
>>> I couldn't reproduce directly on v6.6 (the offsets for `tbl` and `primary_key` didn't overlap).
>>> However I tried changing the zero-length-array to a flexible one:
>>>
>>>     +    DECLARE_FLEX_ARRAY(u8, primary_key);
>>>     +    u8        primary_key[0];

Was this line supposed to be "-"?

>>>
>>> Then the field offsets ended up overlapping, and I also got the same crash on v6.6.
>>
>> The right approach is to transform the zero-length array into a C99 flex-array member,
>> like this:
>>
>> diff --git a/include/net/neighbour.h b/include/net/neighbour.h
>> index 07022bb0d44d..0d28172193fa 100644
>> --- a/include/net/neighbour.h
>> +++ b/include/net/neighbour.h
>> @@ -162,7 +162,7 @@ struct neighbour {
>>         struct rcu_head         rcu;
>>         struct net_device       *dev;
>>         netdevice_tracker       dev_tracker;
>> -       u8                      primary_key[0];
>> +       u8                      primary_key[];
>>  } __randomize_layout;
>>
>>  struct neigh_ops {
>
>In any case, I think we still should convert [0] to [ ].

I would expect the above to fix the problem. If it doesn't I'll need to take a closer look at the plugin...

-Kees

--
Kees Cook