Re: [PATCH] kasan:report: filter out kasan related stack entries

From: Andrey Konovalov
Date: Thu Oct 17 2024 - 20:44:41 EST


On Thu, Oct 17, 2024 at 11:46 PM Nihar Chaithanya
<niharchaithanya@xxxxxxxxx> wrote:
>
> The reports of KASAN include KASAN related stack frames which are not
> the point of interest in the stack-trace. KCSAN report filters out such
> internal frames providing relevant stack trace. Currently, KASAN reports
> are generated by dump_stack_lvl() which prints the entire stack.
>
> Add functionality to KASAN reports to save the stack entries and filter
> out the kasan related stack frames in place of dump_stack_lvl().
>
> Within this new functionality:
> - A function save_stack_lvl_kasan() in place of dump_stack_lvl() is
> created which contains functionality for saving, filtering and printing
> the stack-trace.
> - The stack-trace is saved to an array using stack_trace_save() similar to
> KCSAN reporting which is useful for filtering the stack-trace,
> - The sanitize_stack_entries() function is included to get the number of
> entries to be skipped for filtering similar to KCSAN reporting,
> - The dump_stack_print_info() which prints generic debug info is included
> from __dump_stack(),
> - And the function print_stack_trace() to print the stack-trace using the
> array containing stack entries as well as the number of entries to be
> skipped or filtered out is included.
>
> Signed-off-by: Nihar Chaithanya <niharchaithanya@xxxxxxxxx>
> Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=215756

Great start!

One part that is missing is also filtering out KASAN frames in stack
traces printed from print_track(). Right now it call
stack_depot_print() to print the stack trace. I think the way to
approach this would be to use stack_depot_fetch(), memcpy the frames
to a local buffer, and then reuse the stack trace printing code you
added.

I've also left some comments below.

Please address these points first and send v2. Then, I'll test the
patch and see if there's more things to be done.

On a related note, I wonder if losing the additional annotations about
which part of the stack trace belongs with context (task, irq, etc)
printed by dump_stack() would be a problem. But worst case, we can
hide stack frame filtering under a CONFIG option.

> ---
> mm/kasan/report.c | 92 +++++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 90 insertions(+), 2 deletions(-)
>
> diff --git a/mm/kasan/report.c b/mm/kasan/report.c
> index b48c768acc84..c180cd8b32ae 100644
> --- a/mm/kasan/report.c
> +++ b/mm/kasan/report.c
> @@ -39,6 +39,7 @@ static unsigned long kasan_flags;
>
> #define KASAN_BIT_REPORTED 0
> #define KASAN_BIT_MULTI_SHOT 1
> +#define NUM_STACK_ENTRIES 64

If we keep this as 64, we can reuse KASAN_STACK_DEPTH.

However, I wonder if 64 frames is enough. Marco, Alexander, Dmitry,
IIRC you did some measurements on the length of stack traces in the
kernel: would 64 frames be good enough for KASAN reports? Was this
ever a problem for KCSAN?

>
> enum kasan_arg_fault {
> KASAN_ARG_FAULT_DEFAULT,
> @@ -369,12 +370,99 @@ static inline bool init_task_stack_addr(const void *addr)
> sizeof(init_thread_union.stack));
> }
>
> +/* Helper to skip KASAN-related functions in stack-trace. */
> +static int get_stack_skipnr(const unsigned long stack_entries[], int num_entries)
> +{
> + char buf[64];
> + int len, skip;
> +
> + for (skip = 0; skip < num_entries; ++skip) {
> + len = scnprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skip]);
> +
> + /* Never show kasan_* functions. */
> + if (strnstr(buf, "kasan_", len) == buf)
> + continue;
> + /*
> + * No match for runtime functions -- @skip entries to skip to
> + * get to first frame of interest.
> + */
> + break;
> + }
> +
> + return skip;
> +}
> +

Please also copy the comment for this function, it's useful for
understanding what's going on.

> +static int
> +replace_stack_entry(unsigned long stack_entries[], int num_entries, unsigned long ip,
> + unsigned long *replaced)
> +{
> + unsigned long symbolsize, offset;
> + unsigned long target_func;
> + int skip;
> +
> + if (kallsyms_lookup_size_offset(ip, &symbolsize, &offset))
> + target_func = ip - offset;
> + else
> + goto fallback;
> +
> + for (skip = 0; skip < num_entries; ++skip) {
> + unsigned long func = stack_entries[skip];
> +
> + if (!kallsyms_lookup_size_offset(func, &symbolsize, &offset))
> + goto fallback;
> + func -= offset;
> +
> + if (func == target_func) {
> + *replaced = stack_entries[skip];
> + stack_entries[skip] = ip;
> + return skip;
> + }
> + }
> +
> +fallback:
> + /* Should not happen; the resulting stack trace is likely misleading. */
> + WARN_ONCE(1, "Cannot find frame for %pS in stack trace", (void *)ip);
> + return get_stack_skipnr(stack_entries, num_entries);
> +}

Hm, There's some code duplication here between KCSAN and KASAN.
Although, the function above is the only part dully duplicated, so I
don't know whether it makes sense to try to factor it out into a
common file.

Marco, WDYT?

> +
> +static void
> +print_stack_trace(unsigned long stack_entries[], int num_entries, unsigned long reordered_to)
> +{
> + stack_trace_print(stack_entries, num_entries, 0);
> + if (reordered_to)
> + pr_err(" |\n +-> reordered to: %pS\n", (void *)reordered_to);

This reordered_to is a KCSAN-specific part, KASAN doesn't need it.
Thus, this helper function is excessive, let's remove it.

> +}
> +
> +static int
> +sanitize_stack_entries(unsigned long stack_entries[], int num_entries, unsigned long ip,
> + unsigned long *replaced)
> +{
> + return ip ? replace_stack_entry(stack_entries, num_entries, ip, replaced) :
> + get_stack_skipnr(stack_entries, num_entries);
> +}
> +
> +static void save_stack_lvl_kasan(const char *log_lvl, struct kasan_report_info *info)

And this one we can then call print_stack_trace().

> +{
> + unsigned long reordered_to = 0;
> + unsigned long stack_entries[NUM_STACK_ENTRIES] = {0};
> + int num_stack_entries = stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
> + int skipnr = sanitize_stack_entries(stack_entries,
> + num_stack_entries, info->ip, &reordered_to);
> +
> + dump_stack_print_info(log_lvl);

No need to pass the log level down the call chain, just pass KERN_ERR
directly to dump_stack_print_info().

> + pr_err("\n");

dump_stack() doesn't add a new line here, let's also drop it to keep
the report style as before.

> +
> + print_stack_trace(stack_entries + skipnr, num_stack_entries - skipnr,
> + reordered_to);
> + pr_err("\n");
> +}
> +
> static void print_address_description(void *addr, u8 tag,
> struct kasan_report_info *info)
> {
> struct page *page = addr_to_page(addr);
>
> - dump_stack_lvl(KERN_ERR);
> + save_stack_lvl_kasan(KERN_ERR, info);
> pr_err("\n");
>
> if (info->cache && info->object) {
> @@ -488,7 +576,7 @@ static void print_report(struct kasan_report_info *info)
> print_address_description(addr, tag, info);
> print_memory_metadata(info->first_bad_addr);
> } else {
> - dump_stack_lvl(KERN_ERR);
> + save_stack_lvl_kasan(KERN_ERR, info);

Ah, looks like we have duplicated dump_stack_lvl() call. Let's move it
(or rather print_stack_trace() in this patch) before the if
(addr_has_metadata(addr)) check and the drop it from
print_address_description().

> }
> }

>
> --
> 2.34.1
>