Re: [PATCH 0/3] IRQ stack support for ARM

From: Florian Fainelli
Date: Thu Oct 15 2020 - 17:16:37 EST


On 10/15/20 1:59 PM, Nick Desaulniers wrote:
> On Thu, Oct 8, 2020 at 1:30 AM Russell King - ARM Linux admin
> <linux@xxxxxxxxxxxxxxx> wrote:
>>
>> On Thu, Oct 08, 2020 at 12:45:30PM +0530, Maninder Singh wrote:
>>> Observed Stack Overflow on 8KB kernel stack on ARM specially
>>> incase on network interrupts, which results in undeterministic behaviour.
>>> So there is need for per cpu dedicated IRQ stack for ARM.
>>>
>>> As ARm does not have extra co-processor register
>>> to save thread info pointer, IRQ stack will be at some
>>> performance cost, so code is under CONFIG_IRQ_STACK.
>>>
>>> and we don't have much knowledge and set up for CLANG
>>> and ARM_UNWIND, so dependency added for both cases.
>>>
>>> Tested patch set with QEMU for latest kernel
>>> and 4.1 kernel for ARM target with same patch set.
>>
>> You need to investigate and show where and why this is happening. My
>> guess is you have a network driver that uses a lot of kernel stack
>> space, which itself would be a bug.
>>
>> Note that there are compiler versions out there that mis-optimise and
>> eat stack space - the kernel build should be warning if a function
>> uses a large amount of stack.
>
> For tracking down those not-super-helpful compiler warnings, I wrote a
> tool where if you rebuild with debug info, and give it the object file
> and string of the function the compiler warned about it will parse the
> DWARF to tell you the size of each local variable, and if it came from
> an inline frame. Generally, it's possible to stack allocate something
> that's way too big; instead those should be allocated on the heap.
> https://github.com/ClangBuiltLinux/frame-larger-than
> (I haven't had time to sit down and use it to resolve all outstanding
> issues, but it has worked well for me in the past)

Things get a bit more difficult with the network stack and you easily
recurse into functions and blow up the stack size. This is especially
true if you have some complex network tunneling or filtering going on.

For one, in the 4.1 kernel that appears to have been used as a basis for
this work, if you have CONFIG_BPF enabled but not
CONFIG_BPF_JIT_ALWAYS_ON, __bpf_prog_run will require about 724 bytes of
stack last I measured, that's nearly 10% of the stack that goes away
just like that.
--
Florian