Re: [PATCH v3] lib/stackdepot: Double DEPOT_POOLS_CAP if KASAN is enabled

From: Waiman Long
Date: Thu Aug 08 2024 - 18:30:58 EST



On 8/8/24 12:12, Andrey Konovalov wrote:
When a wide variety of workloads are run on a debug kernel with KASAN
enabled, the following warning may sometimes be printed.

[ 6818.650674] Stack depot reached limit capacity
[ 6818.650730] WARNING: CPU: 1 PID: 272741 at lib/stackdepot.c:252 depot_alloc_stack+0x39e/0x3d0
:
[ 6818.650907] Call Trace:
[ 6818.650909] [<00047dd453d84b92>] depot_alloc_stack+0x3a2/0x3d0
[ 6818.650916] [<00047dd453d85254>] stack_depot_save_flags+0x4f4/0x5c0
[ 6818.650920] [<00047dd4535872c6>] kasan_save_stack+0x56/0x70
[ 6818.650924] [<00047dd453587328>] kasan_save_track+0x28/0x40
[ 6818.650927] [<00047dd45358a27a>] kasan_save_free_info+0x4a/0x70
[ 6818.650930] [<00047dd45358766a>] __kasan_slab_free+0x12a/0x1d0
[ 6818.650933] [<00047dd45350deb4>] kmem_cache_free+0x1b4/0x580
[ 6818.650938] [<00047dd452c520da>] __put_task_struct+0x24a/0x320
[ 6818.650945] [<00047dd452c6aee4>] delayed_put_task_struct+0x294/0x350
[ 6818.650949] [<00047dd452e9066a>] rcu_do_batch+0x6ea/0x2090
[ 6818.650953] [<00047dd452ea60f4>] rcu_core+0x474/0xa90
[ 6818.650956] [<00047dd452c780c0>] handle_softirqs+0x3c0/0xf90
[ 6818.650960] [<00047dd452c76fbe>] __irq_exit_rcu+0x35e/0x460
[ 6818.650963] [<00047dd452c79992>] irq_exit_rcu+0x22/0xb0
[ 6818.650966] [<00047dd454bd8128>] do_ext_irq+0xd8/0x120
[ 6818.650972] [<00047dd454c0ddd0>] ext_int_handler+0xb8/0xe8
[ 6818.650979] [<00047dd453589cf6>] kasan_check_range+0x236/0x2f0
[ 6818.650982] [<00047dd453378cf0>] filemap_get_pages+0x190/0xaa0
[ 6818.650986] [<00047dd453379940>] filemap_read+0x340/0xa70
[ 6818.650989] [<00047dd3d325d226>] xfs_file_buffered_read+0x2c6/0x400 [xfs]
[ 6818.651431] [<00047dd3d325dfe2>] xfs_file_read_iter+0x2c2/0x550 [xfs]
[ 6818.651663] [<00047dd45364710c>] vfs_read+0x64c/0x8c0
[ 6818.651669] [<00047dd453648ed8>] ksys_read+0x118/0x200
[ 6818.651672] [<00047dd452b6cf5a>] do_syscall+0x27a/0x380
[ 6818.651676] [<00047dd454bd7e74>] __do_syscall+0xf4/0x1a0
[ 6818.651680] [<00047dd454c0db58>] system_call+0x70/0x98

With all the recent changes in stackdepot to support new KASAN features,
it is obvious that the current DEPOT_POOLS_CAP of 8192 may not be
enough when KASAN is enabled. Fix this stackdepot capability issue
by doubling DEPOT_POOLS_CAP if KASAN is enabled. With 4k pages, the
maximum stackdepot capacity is doubled to 256 MB with KASAN enabled.
It is possible that the stack depot runs out of space due to a truly
large number of unique stack traces, but I would first make sure that
is indeed the case. The one thing to check would be to dump all the
stack traces from the stack depot when it overflows, and check whether
they make sense. There have been cases in the past, when e.g. the task
context part of a stack trace from an interrupt didn't get stripped
properly, and thus almost each stack trace from an interrupt was
considered unique by the stack depot. Perhaps, something similar
started happening again.

You are right. Indeed this problem is caused by mixing the task and interrupt stack traces and treating them as unique traces causing a lot more partially duplicated stack traces to be stored and thus overflowing the current stackdepot capacity.

This problem happens in a s390 machine and it looks like __irqentry_text_start and __irqentry_text_end aren't properly set up causing filter_irq_stacks() to fail in identifying the proper interrupt boundary.

It may be a defect in the s390 architectural code or something is wrong in the way the s390 kernel is being built. I am not proficient in the s390 arch. So I will ask our s390 experts to take a further look at this.

Thanks a lot for the helping me to figure this out. I am going to drop this patch.

Cheers,
Longman