Re: [PATCH] fs/select: avoid clang stack usage warning

From: Arnd Bergmann
Date: Fri Oct 07 2022 - 04:28:45 EST


On Fri, Oct 7, 2022, at 12:21 AM, Nick Desaulniers wrote:
> On Thu, Mar 07, 2019 at 10:01:36AM +0100, Arnd Bergmann wrote:
>> The select() implementation is carefully tuned to put a sensible amount
>> of data on the stack for holding a copy of the user space fd_set,
>> but not too large to risk overflowing the kernel stack.
>>
>> When building a 32-bit kernel with clang, we need a little more space
>> than with gcc, which often triggers a warning:
>>
>> fs/select.c:619:5: error: stack frame size of 1048 bytes in function 'core_sys_select' [-Werror,-Wframe-larger-than=]
>> int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp,
>>
>> I experimentally found that for 32-bit ARM, reducing the maximum
>> stack usage by 64 bytes keeps us reliably under the warning limit
>> again.
>>
>> Signed-off-by: Arnd Bergmann <arnd@xxxxxxxx>
>> ---
>> include/linux/poll.h | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/include/linux/poll.h b/include/linux/poll.h
>> index 7e0fdcf905d2..1cdc32b1f1b0 100644
>> --- a/include/linux/poll.h
>> +++ b/include/linux/poll.h
>> @@ -16,7 +16,11 @@
>> extern struct ctl_table epoll_table[]; /* for sysctl */
>> /* ~832 bytes of stack space used max in sys_select/sys_poll before allocating
>> additional memory. */
>> +#ifdef __clang__
>> +#define MAX_STACK_ALLOC 768
>
> Hi Arnd,
> Upon a toolchain upgrade for Android, our 32b x86 image used for
> first-party developer VMs started tripping -Wframe-larger-than= again
> (thanks -Werror) which is blocking our ability to upgrade our toolchain.
>
> I've attached the zstd compressed .config file that reproduces with ToT
> LLVM:
>
> $ cd linux
> $ zstd -d path/to/config.zst -o .config
> $ make ARCH=i386 LLVM=1 -j128 fs/select.o
> fs/select.c:625:5: error: stack frame size (1028) exceeds limit (1024)
> in 'core_sys_select' [-Werror,-Wframe-larger-than]
> int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp,
> ^
>
> As you can see, we're just barely tipping over the limit. Should I send
> a patch to reduce this again? If so, any thoughts by how much?
> Decrementing the current value by 4 builds the config in question, but
> seems brittle.
>
> Do we need to only do this if !CONFIG_64BIT?
> commit ad312f95d41c ("fs/select: avoid clang stack usage warning")
> seems to allude to this being more problematic on 32b targets?

I think we should keep the limit consistent between 32 bit and 64 bit
kernels. Lowering the allocation a bit more would of course have a
performance impact for users that are just below the current limit,
so I think it would be best to first look at what might be going
wrong in the compiler.

I managed to reproduce the issue and had a look at what happens
here. A few random observations:

- the kernel is built with -fsanitize=local-bounds, dropping this
option reduces the stack allocation for this function by around
100 bytes, which would be the easiest change for you to build
those kernels again without any source changes, but it may also
be possible to change the core_sys_select function in a way that
avoids the insertion of runtime bounds checks.

- If I mark 'do_select' as noinline_for_stack, the reported frame
size is decreased a lot and is suddenly independent of
-fsanitize=local-bounds:
fs/select.c:625:5: error: stack frame size (336) exceeds limit (100) in 'core_sys_select' [-Werror,-Wframe-larger-than]
int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp,
fs/select.c:479:21: error: stack frame size (684) exceeds limit (100) in 'do_select' [-Werror,-Wframe-larger-than]
static noinline int do_select(int n, fd_set_bits *fds, struct timespec64 *end_time)
However, I don't even see how this makes sense at all, given that
the actual frame size should be at least SELECT_STACK_ALLOC!

- The behavior of -ftrivial-auto-var-init= is a bit odd here: with =zero or
=pattern, the stack usage is just below the limit (1020), without the
option it is up to 1044. It looks like your .config picks =zero, which
was dropped in the latest clang version, so it falls back to not
initializing. Setting it to =pattern should give you the old
behavior, but I don't understand why clang uses more stack without
the initialization, rather than using less, as it would likely cause
fewer spills

Arnd