Re: [PATCH v3] lib/stackdepot: Double DEPOT_POOLS_CAP if KASAN is enabled

From: Waiman Long
Date: Thu Aug 08 2024 - 14:32:49 EST



On 8/8/24 12:12, Andrey Konovalov wrote:
On Thu, Aug 8, 2024 at 2:54 PM Waiman Long <longman@xxxxxxxxxx> wrote:
As said in commit 02754e0a484a ("lib/stackdepot.c: bump stackdepot
capacity from 16MB to 128MB"), KASAN uses stackdepot to memorize stacks
for all kmalloc/kfree calls. So stackdepot capacity was increased 8
times to accommodate KASAN usage even thought it was claimed 4X should
be enough at that time.

With commit fc60e0caa94d ("lib/stackdepot: use fixed-sized slots
for stack records"), all stackdepot records uses a fixed size with
CONFIG_STACKDEPOT_MAX_FRAMES (default=64) entries. This is merged to
support evictable KASAN stack records. Commit 31639fd6cebd ("stackdepot:
use variable size records for non-evictable entries") re-enabled
the use of variable size records for non-KASAN use cases, but KASAN
(generic mode) still uses the large fixed size stack records.
No, since commit 711d349174fd ("kasan: revert eviction of stack traces
in generic mode") Generic KASAN doesn't use fixed-sized slots.

With the default CONFIG_STACKDEPOT_MAX_FRAMES of 64, KASAN use of
stackdepot space had been more than double than before. Assuming an
average stack frame size of 16, a KASAN stack record is almost 4X the
size of a non-KASAN one.
So this is not correct.
Yes, you are right. I missed the KASAN commit 711d349174fd in my analysis. I will update the commit log in the next version.
When a wide variety of workloads are run on a debug kernel with KASAN
enabled, the following warning may sometimes be printed.

[ 6818.650674] Stack depot reached limit capacity
[ 6818.650730] WARNING: CPU: 1 PID: 272741 at lib/stackdepot.c:252 depot_alloc_stack+0x39e/0x3d0
:
[ 6818.650907] Call Trace:
[ 6818.650909] [<00047dd453d84b92>] depot_alloc_stack+0x3a2/0x3d0
[ 6818.650916] [<00047dd453d85254>] stack_depot_save_flags+0x4f4/0x5c0
[ 6818.650920] [<00047dd4535872c6>] kasan_save_stack+0x56/0x70
[ 6818.650924] [<00047dd453587328>] kasan_save_track+0x28/0x40
[ 6818.650927] [<00047dd45358a27a>] kasan_save_free_info+0x4a/0x70
[ 6818.650930] [<00047dd45358766a>] __kasan_slab_free+0x12a/0x1d0
[ 6818.650933] [<00047dd45350deb4>] kmem_cache_free+0x1b4/0x580
[ 6818.650938] [<00047dd452c520da>] __put_task_struct+0x24a/0x320
[ 6818.650945] [<00047dd452c6aee4>] delayed_put_task_struct+0x294/0x350
[ 6818.650949] [<00047dd452e9066a>] rcu_do_batch+0x6ea/0x2090
[ 6818.650953] [<00047dd452ea60f4>] rcu_core+0x474/0xa90
[ 6818.650956] [<00047dd452c780c0>] handle_softirqs+0x3c0/0xf90
[ 6818.650960] [<00047dd452c76fbe>] __irq_exit_rcu+0x35e/0x460
[ 6818.650963] [<00047dd452c79992>] irq_exit_rcu+0x22/0xb0
[ 6818.650966] [<00047dd454bd8128>] do_ext_irq+0xd8/0x120
[ 6818.650972] [<00047dd454c0ddd0>] ext_int_handler+0xb8/0xe8
[ 6818.650979] [<00047dd453589cf6>] kasan_check_range+0x236/0x2f0
[ 6818.650982] [<00047dd453378cf0>] filemap_get_pages+0x190/0xaa0
[ 6818.650986] [<00047dd453379940>] filemap_read+0x340/0xa70
[ 6818.650989] [<00047dd3d325d226>] xfs_file_buffered_read+0x2c6/0x400 [xfs]
[ 6818.651431] [<00047dd3d325dfe2>] xfs_file_read_iter+0x2c2/0x550 [xfs]
[ 6818.651663] [<00047dd45364710c>] vfs_read+0x64c/0x8c0
[ 6818.651669] [<00047dd453648ed8>] ksys_read+0x118/0x200
[ 6818.651672] [<00047dd452b6cf5a>] do_syscall+0x27a/0x380
[ 6818.651676] [<00047dd454bd7e74>] __do_syscall+0xf4/0x1a0
[ 6818.651680] [<00047dd454c0db58>] system_call+0x70/0x98

With all the recent changes in stackdepot to support new KASAN features,
it is obvious that the current DEPOT_POOLS_CAP of 8192 may not be
enough when KASAN is enabled. Fix this stackdepot capability issue
by doubling DEPOT_POOLS_CAP if KASAN is enabled. With 4k pages, the
maximum stackdepot capacity is doubled to 256 MB with KASAN enabled.
It is possible that the stack depot runs out of space due to a truly
large number of unique stack traces, but I would first make sure that
is indeed the case. The one thing to check would be to dump all the
stack traces from the stack depot when it overflows, and check whether
they make sense. There have been cases in the past, when e.g. the task
context part of a stack trace from an interrupt didn't get stripped
properly, and thus almost each stack trace from an interrupt was
considered unique by the stack depot. Perhaps, something similar
started happening again.

It could be.

Anyway, I got a bug report from our QE team and they have seen it once so far. So it is not an easily reproducible problem. I need to do further investigation first. Thanks for your quick response.

Cheers,
Longman