Re: [PATCH v2 00/19] stackdepot: allow evicting stack traces

From: Marco Elver
Date: Mon Oct 09 2023 - 08:35:47 EST


On Thu, 5 Oct 2023 at 22:36, Andrey Konovalov <andreyknvl@xxxxxxxxx> wrote:
>
> On Wed, Sep 13, 2023 at 7:14 PM <andrey.konovalov@xxxxxxxxx> wrote:
> >
> > From: Andrey Konovalov <andreyknvl@xxxxxxxxxx>
> >
> > Currently, the stack depot grows indefinitely until it reaches its
> > capacity. Once that happens, the stack depot stops saving new stack
> > traces.
> >
> > This creates a problem for using the stack depot for in-field testing
> > and in production.
> >
> > For such uses, an ideal stack trace storage should:
> >
> > 1. Allow saving fresh stack traces on systems with a large uptime while
> > limiting the amount of memory used to store the traces;
> > 2. Have a low performance impact.
> >
> > Implementing #1 in the stack depot is impossible with the current
> > keep-forever approach. This series targets to address that. Issue #2 is
> > left to be addressed in a future series.
> >
> > This series changes the stack depot implementation to allow evicting
> > unneeded stack traces from the stack depot. The users of the stack depot
> > can do that via new stack_depot_save_flags(STACK_DEPOT_FLAG_GET) and
> > stack_depot_put APIs.
> >
> > Internal changes to the stack depot code include:
> >
> > 1. Storing stack traces in fixed-frame-sized slots; the slot size is
> > controlled via CONFIG_STACKDEPOT_MAX_FRAMES (vs precisely-sized
> > slots in the current implementation);
> > 2. Keeping available slots in a freelist (vs keeping an offset to the next
> > free slot);
> > 3. Using a read/write lock for synchronization (vs a lock-free approach
> > combined with a spinlock).
> >
> > This series also integrates the eviction functionality in the tag-based
> > KASAN modes.
> >
> > Despite wasting some space on rounding up the size of each stack record,
> > with CONFIG_STACKDEPOT_MAX_FRAMES=32, the tag-based KASAN modes end up
> > consuming ~5% less memory in stack depot during boot (with the default
> > stack ring size of 32k entries). The reason for this is the eviction of
> > irrelevant stack traces from the stack depot, which frees up space for
> > other stack traces.
> >
> > For other tools that heavily rely on the stack depot, like Generic KASAN
> > and KMSAN, this change leads to the stack depot capacity being reached
> > sooner than before. However, as these tools are mainly used in fuzzing
> > scenarios where the kernel is frequently rebooted, this outcome should
> > be acceptable.
> >
> > There is no measurable boot time performance impact of these changes for
> > KASAN on x86-64. I haven't done any tests for arm64 modes (the stack
> > depot without performance optimizations is not suitable for intended use
> > of those anyway), but I expect a similar result. Obtaining and copying
> > stack trace frames when saving them into stack depot is what takes the
> > most time.
> >
> > This series does not yet provide a way to configure the maximum size of
> > the stack depot externally (e.g. via a command-line parameter). This will
> > be added in a separate series, possibly together with the performance
> > improvement changes.
>
> Hi Marco and Alex,
>
> Could you PTAL at the not-yet-reviewed patches in this series when you
> get a chance?

There'll be a v3 with a few smaller still-pending fixes, right? I
think I looked at it a while back and the rest that I didn't comment
on looked fine, just waiting for v3.

Feel free to send a v3 by end of week. I'll try to have another look
today/tomorrow just in case I missed something, but if there are no
more comments please send v3 later in the week.