Re: [PATCH v3 1/1] mtd: cfi_cmdset_0001: Factor out do_write_buffer_locked() to reduce stack frame

From: David Laight

Date: Thu Apr 09 2026 - 07:28:59 EST


On Thu, 9 Apr 2026 09:58:28 +0200
Lukas Wunner <lukas@xxxxxxxxx> wrote:

> On Thu, Apr 09, 2026 at 08:26:11AM +0100, David Laight wrote:
> > On Wed, 8 Apr 2026 23:11:48 +0200 Andy Shevchenko <andriy.shevchenko@xxxxxxxxxxxxxxx> wrote:
> > > Compiler is not happy about used stack frame:
> > >
> > > drivers/mtd/chips/cfi_cmdset_0001.c: In function 'do_write_buffer':
> > > drivers/mtd/chips/cfi_cmdset_0001.c:1887:1: error: the frame size of 1296 bytes is larger than 1280 bytes [-Werror=frame-larger-than=]
> > >
> > > Fix this by factoring out do_write_buffer_locked().
> >
> > Does this just split the large stack frame between two nested functions?
> > I'd also expect the compiler to inline do_write_buffer_locked() so it
> > makes little difference.
> > OTOH I can't immediately see where the large stack frame comes from.
>
> The error occurs for an allmodconfig build on arm, which implies
> CONFIG_KASAN_STACK=y and thus increases stack usage vis-à-vis a
> "regular" build.
>
> Stack usage is high here because of the three "map_word" types,
> which can each be up to 256 unsigned longs (32 * 8), see the
> definitions of MAX_MAP_LONGS, MAX_MAP_BANKWIDTH, map_word in
> include/linux/mtd/map.h.

Ugg - that code is horrid.
Returning structures by value isn't really a good idea.

>
> Possible solutions:
>
> - Disable KASAN entirely for this file:
> https://lore.kernel.org/all/adX3SHYgazijahbG@xxxxxxxxx/
>
> Not always a good option, particularly for stuff like lib/maple_tree.c
> where the same issue exists in mas_wr_spanning_store() and KASAN would
> certainly be good to have for that one.

I've peeked at that at least once.
Some big functions get inlined; IIRC one small function is basically:
if (expr) a(args) else b(args);
and marking both a and b noinline would help a lot.

>
> - Use heap instead of stack.
>
> - Split function in smaller chunks and mark them "noinline".

That might make the code easier to read as well.

But looking at it, I think that a small amount of refactoring
(mostly moving the initial 'status' check before the command
is written) would mean that only one 'map_word' would be valid
at any one time.

I didn't look at what was really happening though.
I suspect it is similar to some code I've written for accessing serial
EEPROM where the control data is written one bit at a time, but the
data itself is read/written in 4 bit chunks (although the low-level hardware
did multiple 'nibble' accesses for wider transfers).
In any case it surely can't be necessary to have a 256+ byte structure
to hold the 8-bit command/status values.
(In my case the 8 bits got 'spread' across a 32bit word and written
(to the fgpa - helped because I was writing that end as well) as a single word.)

David

>
> Thanks,
>
> Lukas
>