Re: gcc-10: kernel stack is corrupted and fails to boot

From: Arvind Sankar
Date: Wed May 13 2020 - 11:48:52 EST


On Wed, May 13, 2020 at 09:50:03AM +0300, Kalle Valo wrote:
> (trimming CC, changing title)
>
> Kalle Valo <kvalo@xxxxxxxxxxxxxx> writes:
>
> > Kalle Valo <kvalo@xxxxxxxxxxxxxx> writes:
> >
> >> Arnd Bergmann <arnd@xxxxxxxx> writes:
> >>
> >>> gcc-10 correctly points out a bug with a zero-length array in
> >>> struct ath10k_pci:
> >>>
> >>> drivers/net/wireless/ath/ath10k/ahb.c: In function 'ath10k_ahb_remove':
> >>> drivers/net/wireless/ath/ath10k/ahb.c:30:9: error: array subscript 0
> >>> is outside the bounds of an interior zero-length array 'struct
> >>> ath10k_ahb[0]' [-Werror=zero-length-bounds]
> >>> 30 | return &((struct ath10k_pci *)ar->drv_priv)->ahb[0];
> >>> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >>> In file included from drivers/net/wireless/ath/ath10k/ahb.c:13:
> >>> drivers/net/wireless/ath/ath10k/pci.h:185:20: note: while referencing 'ahb'
> >>> 185 | struct ath10k_ahb ahb[0];
> >>> | ^~~
> >>>
> >>> The last addition to the struct ignored the comments and added
> >>> new members behind the array that must remain last.
> >>>
> >>> Change it to a flexible-array member and move it last again to
> >>> make it work correctly, prevent the same thing from happening
> >>> again (all compilers warn about flexible-array members in the
> >>> middle of a struct) and get it to build without warnings.
> >>
> >> Very good find, thanks! This bug would cause all sort of strange memory
> >> corruption issues.
> >
> > This motivated me to switch to using GCC 10.x and I noticed that you had
> > already upgraded crosstool so it was a trivial thing to do, awesome :)
> >
> > https://mirrors.edge.kernel.org/pub/tools/crosstool/
>
> And now I have a problem :) I first noticed that my x86 testbox is not
> booting when I compile the kernel with GCC 10.1.0 from crosstool. I
> didn't get any error messages so I just downgraded the compiler and the
> kernel was booting fine again. Next I decided to try GCC 10.1 with my
> x86 laptop and it also failed to boot, but this time I got kernel logs
> and saw this:
>
> Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_secodary+0x178/0x180
>

See https://lore.kernel.org/lkml/20200423161126.GD26021@xxxxxxx/