Re: [PATCH 0/3] IRQ stack support for ARM

From: Arnd Bergmann
Date: Wed Oct 21 2020 - 07:58:41 EST


(replying to my own mail, apparently my normal outgoing email server is
blacklisted, so resending from @kernel.org)

On Fri, Oct 16, 2020 at 12:09 PM Arnd Bergmann <arnd@xxxxxxxx> wrote:
>
> On Thu, Oct 8, 2020 at 10:32 AM Russell King - ARM Linux admin
> <linux@xxxxxxxxxxxxxxx> wrote:
> > On Thu, Oct 08, 2020 at 12:45:30PM +0530, Maninder Singh wrote:
> > > Observed Stack Overflow on 8KB kernel stack on ARM specially
> > > incase on network interrupts, which results in undeterministic behaviour.
> > > So there is need for per cpu dedicated IRQ stack for ARM.
> > >
> > > As ARm does not have extra co-processor register
> > > to save thread info pointer, IRQ stack will be at some
> > > performance cost, so code is under CONFIG_IRQ_STACK.
> > >
> > > and we don't have much knowledge and set up for CLANG
> > > and ARM_UNWIND, so dependency added for both cases.
> > >
> > > Tested patch set with QEMU for latest kernel
> > > and 4.1 kernel for ARM target with same patch set.
> >
> > You need to investigate and show where and why this is happening. My
> > guess is you have a network driver that uses a lot of kernel stack
> > space, which itself would be a bug.
>
> Agreed.
>
> > Note that there are compiler versions out there that mis-optimise and
> > eat stack space - the kernel build should be warning if a function
> > uses a large amount of stack.
>
> Some more ideas for figuring it out:
>
> CONFIG_DEBUG_STACK_USAGE may also be helpful in identifying
> code paths that are deeply nested with multiple functions taking a
> lot of stack space, but each one staying under the limit.
>
> CONFIG_DEBUG_STACKOVERFLOW would also help here but
> is not supported on Arm at the moment. There was a patch[1] from
> Uwe Kleine-König to add this, and I suppose we should still add
> that, in particular if it helps debug this problem.
>
> CONFIG_VMAP_STACK is probably the best way to debug
> random runtime stack overflows because using a guard page
> turns random memory corruption into an immediate oops,
> but I don't think there is an implementation for Arm yet and
> using a lot of vmalloc space means we might not be able to
> default to this.
>
> Regardless of identifying and fixing the bug Maninder found, I
> also think that supporting separate async stacks on Arm is useful
> for determinism. Most of the popular architectures use irqstack
> for this reason, and I was actually surprised that we don't do it
> on arch/arm/.
>
> Arnd
>
> [1] https://lore.kernel.org/linux-arm-kernel/20200108082913.29710-1-u.kleine-koenig@xxxxxxxxxxxxxx/