On 2023/4/12 14:50, Vlastimil Babka wrote:
On 4/12/23 08:44, Zhang, Qiang1 wrote:
On 2023/4/11 22:19, Vlastimil Babka wrote:
On 4/11/23 16:08, Qi Zheng wrote:
On 2023/4/11 21:40, Vlastimil Babka wrote:
On 4/11/23 15:08, Qi Zheng wrote:
The list_lock can be held in the critical section of
raw_spinlock, and then lockdep will complain about it
like below:
=============================
[ BUG: Invalid wait context ]
6.3.0-rc6-next-20230411 #7 Not tainted
-----------------------------
swapper/0/1 is trying to lock:
ffff888100055418 (&n->list_lock){....}-{3:3}, at: ___slab_alloc+0x73d/0x1330
other info that might help us debug this:
context-{5:5}
2 locks held by swapper/0/1:
#0: ffffffff824e8160 (rcu_tasks.cbs_gbl_lock){....}-{2:2}, at: cblist_init_generic+0x22/0x2d0
#1: ffff888136bede50 (&ACCESS_PRIVATE(rtpcp, lock)){....}-{2:2}, at: cblist_init_generic+0x232/0x2d0
stack backtrace:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc6-next-20230411 #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x77/0xc0
__lock_acquire+0xa65/0x2950
? arch_stack_walk+0x65/0xf0
? arch_stack_walk+0x65/0xf0
? unwind_next_frame+0x602/0x8d0
lock_acquire+0xe0/0x300
? ___slab_alloc+0x73d/0x1330
? find_usage_forwards+0x39/0x50
? check_irq_usage+0x162/0xa70
? __bfs+0x10c/0x2c0
_raw_spin_lock_irqsave+0x4f/0x90
? ___slab_alloc+0x73d/0x1330
___slab_alloc+0x73d/0x1330
? fill_pool+0x16b/0x2a0
? look_up_lock_class+0x5d/0x160
? register_lock_class+0x48/0x500
? __lock_acquire+0xabc/0x2950
? fill_pool+0x16b/0x2a0
kmem_cache_alloc+0x358/0x3b0
? __lock_acquire+0xabc/0x2950
fill_pool+0x16b/0x2a0
? __debug_object_init+0x292/0x560
? lock_acquire+0xe0/0x300
? cblist_init_generic+0x232/0x2d0
__debug_object_init+0x2c/0x560
This "__debug_object_init" is because INIT_WORK() is called in
cblist_init_generic(), so..
cblist_init_generic+0x147/0x2d0
rcu_init_tasks_generic+0x15/0x190
kernel_init_freeable+0x6e/0x3e0
? rest_init+0x1e0/0x1e0
kernel_init+0x1b/0x1d0
? rest_init+0x1e0/0x1e0
ret_from_fork+0x1f/0x30
</TASK>
The fill_pool() can only be called in the !PREEMPT_RT kernel
or in the preemptible context of the PREEMPT_RT kernel, so
the above warning is not a real issue, but it's better to
annotate kmem_cache_node->list_lock as raw_spinlock to get
rid of such issue.
+ CC some RT and RCU people
Thanks.
AFAIK raw_spinlock is not just an annotation, but on RT it changes the
implementation from preemptible mutex to actual spin lock, so it would be
Yeah.
rather unfortunate to do that for a spurious warning. Can it be somehow
fixed in a better way?
... probably a better fix is to drop locks and call INIT_WORK(), or make
the cblist_init_generic() lockless (or part lockless), given it's just
initializing the cblist, it's probably doable. But I haven't taken a
careful look yet.
This is just one of the paths that triggers an invalid wait, the following paths can also trigger:
[ 129.914547] [ BUG: Invalid wait context ]
[ 129.914775] 6.3.0-rc1-yocto-standard+ #2 Not tainted
[ 129.915044] -----------------------------
[ 129.915272] kworker/2:0/28 is trying to lock:
[ 129.915516] ffff88815660f570 (&c->lock){-.-.}-{3:3}, at: ___slab_alloc+0x68/0x12e0
[ 129.915967] other info that might help us debug this:
[ 129.916241] context-{5:5}
[ 129.916392] 3 locks held by kworker/2:0/28:
[ 129.916642] #0: ffff888100084d48 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x515/0xba0
[ 129.917145] #1: ffff888100c17dd0 ((work_completion)(&(&krcp->monitor_work)->work)){+.+.}-{0:0}, at: process_on0
[ 129.917758] #2: ffff8881565f8508 (krc.lock){....}-{2:2}, at: kfree_rcu_monitor+0x29f/0x810
[ 129.918207] stack backtrace:
[ 129.918374] CPU: 2 PID: 28 Comm: kworker/2:0 Not tainted 6.3.0-rc1-yocto-standard+ #2
[ 129.918784] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.o4
[ 129.919397] Workqueue: events kfree_rcu_monitor
[ 129.919662] Call Trace:
[ 129.919812] <TASK>
[ 129.919941] dump_stack_lvl+0x64/0xb0
[ 129.920171] dump_stack+0x10/0x20
[ 129.920372] __lock_acquire+0xeb8/0x3a80
[ 129.920603] ? ret_from_fork+0x2c/0x50
[ 129.920824] ? __pfx___lock_acquire+0x10/0x10
[ 129.921068] ? unwind_next_frame.part.0+0x1ba/0x3c0
[ 129.921343] ? ret_from_fork+0x2c/0x50
[ 129.921573] ? __this_cpu_preempt_check+0x13/0x20
[ 129.921847] lock_acquire+0x194/0x480
[ 129.922060] ? ___slab_alloc+0x68/0x12e0
[ 129.922293] ? __pfx_lock_acquire+0x10/0x10
[ 129.922529] ? __pfx_mark_lock.part.0+0x10/0x10
[ 129.922778] ? __kasan_check_read+0x11/0x20
[ 129.922998] ___slab_alloc+0x9a/0x12e0
[ 129.923222] ? ___slab_alloc+0x68/0x12e0
[ 129.923452] ? __pfx_mark_lock.part.0+0x10/0x10
[ 129.923706] ? __kasan_check_read+0x11/0x20
[ 129.923937] ? fill_pool+0x22a/0x370
[ 129.924161] ? __lock_acquire+0xf5b/0x3a80
[ 129.924387] ? fill_pool+0x22a/0x370
[ 129.924590] __slab_alloc.constprop.0+0x5b/0x90
[ 129.924832] kmem_cache_alloc+0x296/0x3d0
[ 129.925073] ? fill_pool+0x22a/0x370
[ 129.925291] fill_pool+0x22a/0x370
[ 129.925495] ? __pfx_fill_pool+0x10/0x10
[ 129.925718] ? __pfx___lock_acquire+0x10/0x10
[ 129.926034] ? __kasan_check_read+0x11/0x20
[ 129.926269] ? check_chain_key+0x200/0x2b0
[ 129.926503] __debug_object_init+0x82/0x8c0
[ 129.926734] ? __pfx_lock_release+0x10/0x10
[ 129.926984] ? __pfx___debug_object_init+0x10/0x10
[ 129.927249] ? __kasan_check_read+0x11/0x20
[ 129.927498] ? do_raw_spin_unlock+0x9c/0x100
[ 129.927758] debug_object_activate+0x2d1/0x2f0
[ 129.928022] ? __pfx_debug_object_activate+0x10/0x10
[ 129.928300] ? __this_cpu_preempt_check+0x13/0x20
[ 129.928583] __call_rcu_common.constprop.0+0x94/0xeb0
[ 129.928897] ? __this_cpu_preempt_check+0x13/0x20
[ 129.929186] ? __pfx_rcu_work_rcufn+0x10/0x10
[ 129.929459] ? __pfx___call_rcu_common.constprop.0+0x10/0x10
[ 129.929803] ? __pfx_lock_acquired+0x10/0x10
[ 129.930067] ? __pfx_do_raw_spin_trylock+0x10/0x10
[ 129.930363] ? kfree_rcu_monitor+0x29f/0x810
[ 129.930627] call_rcu+0xe/0x20
[ 129.930821] queue_rcu_work+0x4f/0x60
[ 129.931050] kfree_rcu_monitor+0x5d3/0x810
[ 129.931302] ? __pfx_kfree_rcu_monitor+0x10/0x10
[ 129.931587] ? __this_cpu_preempt_check+0x13/0x20
[ 129.931878] process_one_work+0x607/0xba0
[ 129.932129] ? __pfx_process_one_work+0x10/0x10
[ 129.932408] ? worker_thread+0xd6/0x710
[ 129.932653] worker_thread+0x2d4/0x710
[ 129.932888] ? __pfx_worker_thread+0x10/0x10
[ 129.933154] kthread+0x18b/0x1c0
[ 129.933363] ? __pfx_kthread+0x10/0x10
[ 129.933598] ret_from_fork+0x2c/0x50
[ 129.933825] </TASK>
Maybe no need to convert ->list_lock to raw_spinlock.
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -562,10 +562,10 @@ __debug_object_init(void *addr, const struct debug_obj_descr *descr, int onstack
unsigned long flags;
/*
- * On RT enabled kernels the pool refill must happen in preemptible
+ * The pool refill must happen in preemptible
* context:
*/
- if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible())
+ if (preemptible())
fill_pool();
+CC Peterz
Aha so this is in fact another case where the code is written with
actual differences between PREEMPT_RT and !PREEMPT_RT in mind, but
CONFIG_PROVE_RAW_LOCK_NESTING always assumes PREEMPT_RT?
Maybe we should make CONFIG_PROVE_RAW_LOCK_NESTING depend on
CONFIG_PREEMPT_RT:
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index f0d5b82e478d..257b170aacb6 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1262,6 +1262,7 @@ config PROVE_LOCKING
config PROVE_RAW_LOCK_NESTING
bool "Enable raw_spinlock - spinlock nesting checks"
depends on PROVE_LOCKING
+ depends on PREEMPT_RT
default n
help
Enable the raw_spinlock vs. spinlock nesting checks which ensure
db = get_bucket((unsigned long) addr);
Thanks
Zqiang
Regards,
Boqun
It's indeed unfortunate for the warning in the commit message. But
functions like kmem_cache_alloc(GFP_ATOMIC) may indeed be called
in the critical section of raw_spinlock or in the hardirq context, which
Hmm, I thought they may not, actually.
will cause problem in the PREEMPT_RT kernel. So I still think it is
reasonable to convert kmem_cache_node->list_lock to raw_spinlock type.
It wouldn't be the complete solution anyway. Once we allow even a GFP_ATOMIC
slab allocation for such context, it means also page allocation can happen
to refill the slabs, so lockdep will eventually complain about zone->lock,
and who knows what else.
Oh, indeed. :(
In addition, there are many fix patches for this kind of warning in the
git log, so I also think there should be a general and better solution. :)
Maybe, but given above, I doubt it's this one.
--
Thanks,
Qi