Re: [PATCH v2] docs: deprecated.rst: Add zero-length and one-element arrays

From: Gustavo A. R. Silva
Date: Fri Jun 05 2020 - 17:31:48 EST




On 6/5/20 14:30, Kees Cook wrote:
> On Fri, Jun 05, 2020 at 11:21:42AM -0500, Gustavo A. R. Silva wrote:
>> Add zero-length and one-element arrays to the list.
>>
>> While I continue replacing zero-length and one-element arrays with
>> flexible-array members, I need a reference to point people to, so
>> they don't introduce more instances of such arrays. And while here,
>> add a note to the "open-coded arithmetic in allocator arguments"
>> section, on the use of struct_size() and the arrays-to-deprecate
>> mentioned here.
>>
>> Signed-off-by: Gustavo A. R. Silva <gustavoars@xxxxxxxxxx>
>> ---
>> Changes in v2:
>> - Adjust some markup links for readability.
>>
>> Documentation/process/deprecated.rst | 83 ++++++++++++++++++++++++++++
>> 1 file changed, 83 insertions(+)
>>
>> diff --git a/Documentation/process/deprecated.rst b/Documentation/process/deprecated.rst
>> index 652e2aa02a66c..042c21c968e19 100644
>> --- a/Documentation/process/deprecated.rst
>> +++ b/Documentation/process/deprecated.rst
>> @@ -85,6 +85,11 @@ Instead, use the helper::
>>
>> header = kzalloc(struct_size(header, item, count), GFP_KERNEL);
>>
>> +NOTE: If you are using struct_size() on a structure containing a zero-length
>
> Please use:
>
> .. note::
>

OK.

> for this kind of "NOTE:"
>
>> +or a one-element array as a trailing array member, stop using such arrays
>
> And I think it was likely my language suggestion to say "stop using", but
> probably this should be more friendly. How about "please refactor such
> arrays ..."
>
>> +and switch to `flexible arrays <#zero-length-and-one-element-arrays>`_
>
> ... to a `flexible array member <#...
>
>> +instead.
>> +
>
>> See array_size(), array3_size(), and struct_size(),
>> for more details as well as the related check_add_overflow() and
>> check_mul_overflow() family of functions.
>> @@ -200,3 +205,81 @@ All switch/case blocks must end in one of:
>> * continue;
>> * goto <label>;
>> * return [expression];
>> +
>> +Zero-length and one-element arrays
>> +----------------------------------
>> +Old code in the kernel uses the zero-length and one-element array extensions
>> +to the C90 standard, but the `preferred mechanism to declare variable-length
>
> I'd like to reword this to make an immediate statement about what _should_
> be done, and then move into the details on an as accurate as possible
> review of the history of these work-arounds. How about this, which I
> mixed some of your earlier paragraphs into:
>
>
>
> There is a regular need in the kernel to provide a way to declare having
> a dynamically sized set of trailing elements in a structure. Kernel code
> should always use `"flexible array members" <https://en.wikipedia.org/wiki/Flexible_array_member>`_
> for these cases. The older style of one-element or zero-length arrays should
> no longer be used.
>
> In older C code, dynamically sized trailing elements were done by specifying
> a one-element array at the end of a structure::
>
> struct something {
> int count;
> struct items[1];
> };
>
> This led to fragile size calculations via sizeof() (which would need to
> remove the size of the single trailing element to get a correct size of
> the "header"). A `GNU C extension <https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html>`_
> was introduced to allow for zero-length arrays, to avoid these kinds of
> size problems::
>
> struct something {
> int count;
> struct items[0];
> };
>
> But this led to other problems, and didn't solve some problems shared by
> both styles, like not being to able to detect when such an array is
> accidentally being used _not_ at the end of a structure (which could happen
> directly, or when such a struct was in unions, structs of structs, etc).
>
> C99 introduced "flexible array members", which lacks a numeric size for
> the array declaration entirely::
>
> struct something {
> int count;
> struct items[];
> };
>
> This is the way the kernel expects dynamically sized trailing elements
> to be declared. It allows the compiler to generate errors when the
> flexible array does not occur last in the structure, which helps to prevent
> some kind of `undefined behavior
> <https://git.kernel.org/linus/76497732932f15e7323dc805e8ea8dc11bb587cf>`_
> bugs from being inadvertently introduced to the codebase. It also allows
> the compiler to correctly analyze array sizes (via sizeof(),
> `CONFIG_FORTIFY_SOURCE`, and `CONFIG_UBSAN_BOUNDS`). For instance,
> there is no mechanism that warns us that the following application of the
> sizeof() operator to a zero-length array always results in zero::
>
> struct something {
> int count;
> struct items[0];
> };
>
> struct something *instance;
>
> instance = kmalloc(struct_size(instance, items, size), GFP_KERNEL);
> instance->length = size;
> memcpy(instance->items, source, size);
> ...
> size = sizeof(instance->items);
>
>
> [and then continue with the rest of the details you wrote below...]
>
>> +
>> +At the last line of code above, ``size`` turns out to be zero --when one might have
>> +thought differently. Here are a couple examples of this issue: `link 1
>> +<https://git.kernel.org/linus/f2cd32a443da694ac4e28fbf4ac6f9d5cc63a539>`_,
>> +`link 2
>> +<https://git.kernel.org/linus/ab91c2a89f86be2898cee208d492816ec238b2cf>`_.
>> +On the other hand, `flexible array members have incomplete type, and so the sizeof()
>> +operator may not be applied <https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html>`_,
>> +so any misuse of such operator will be immediately noticed at build time.
>> +
>> +With respect to one-element arrays, one has to be acutely aware that `such arrays
>> +occupy at least as much space as a single object of the type
>> +<https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html>`_,
>> +hence they contribute to the size of the enclosing structure. This is prone
>> +to error every time people want to calculate the total size of dynamic memory
>> +to allocate for a structure containing an array of this kind as a member::
>> +
>> + struct something {
>> + int length;
>> + char data[1];
>> + };
>
> And for all of these examples, I'd like to avoid using "char" as the type
> for the flex array member, since it doesn't drive home the idea of
> having a multiplier (i.e. "length" matches the size of "data") And
> similarly, "length" should, I think, be called "count". That way the
> bytes is separate from count, but is the result of sizeof(*items) *
> count.
>
>> +
>> + struct something *instance;
>> +
>> + instance = kmalloc(struct_size(instance, data, size - 1), GFP_KERNEL);
>> + instance->length = size;
>> + memcpy(instance->data, source, size);
>> +
>> +In the example above, we had to remember to calculate ``size - 1`` when using
>
> I don't want to get people confused between "size" (in bytes) vs "count"
> (of trailing elements).
>

Yep. I'll change that to avoid confusion.

>> +the struct_size() helper, otherwise we would have --unintentionally-- allocated
>> +memory for one too many ``data`` objects. The cleanest and least error-prone way
>> +to implement this is through the use of a `flexible array member`, which is
>> +exemplified at the `beginning <#zero-length-and-one-element-arrays>`_ of this
>> +section.
>> --
>> 2.27.0
>>
>
> How does that look?
>

Got it. Working on the changes right now...

Thanks for the feedback!
--
Gustavo