Re: [PATCH 1/1] hung_task: show the blocker task if the task is hung on semaphore
From: Google
Date: Mon Mar 03 2025 - 01:29:20 EST
On Sat, 1 Mar 2025 13:51:02 +0800
Lance Yang <ioworker0@xxxxxxxxx> wrote:
> Inspired by mutex blocker tracking[1], this patch makes a trade-off to
> balance the overhead and utility of the hung task detector.
Thanks for adding new one!
>
> Unlike mutexes, semaphores lack explicit ownership tracking, making it
> challenging to identify the root cause of hangs. To address this, we
> introduce a last_holder field to the semaphore structure, which is
> updated when a task successfully calls down() and cleared during up().
>
> The assumption is that if a task is blocked on a semaphore, the holders
> must not have released it. While this does not guarantee that the last
> holder is one of the current blockers, it likely provides a practical hint
> for diagnosing semaphore-related stalls.
Yeah, if we can have something like the owner_list, we can find the
longest time blocker (owner) but it takes more {memory, performance}
overheads.
>
> With this change, the hung task detector can now show blocker task's info
> like below:
>
> [Sat Mar 1 02:39:52 2025] INFO: task cat:1437 blocked for more than 122 seconds.
> [Sat Mar 1 02:39:52 2025] Tainted: G OE 6.14.0-rc3+ #9
> [Sat Mar 1 02:39:52 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [Sat Mar 1 02:39:52 2025] task:cat state:D stack:0 pid:1437 tgid:1437 ppid:1007 task_flags:0x400000 flags:0x00000004
> [Sat Mar 1 02:39:52 2025] Call trace:
> [Sat Mar 1 02:39:52 2025] __switch_to+0x1ec/0x380 (T)
> [Sat Mar 1 02:39:52 2025] __schedule+0xc30/0x44f8
> [Sat Mar 1 02:39:52 2025] schedule+0xb8/0x3b0
> [Sat Mar 1 02:39:52 2025] schedule_timeout+0x1d0/0x208
> [Sat Mar 1 02:39:52 2025] __down_common+0x27c/0x600
> [Sat Mar 1 02:39:52 2025] __down+0x24/0x50
> [Sat Mar 1 02:39:52 2025] down+0xe0/0x140
> [Sat Mar 1 02:39:52 2025] read_dummy+0x3c/0xa0 [hung_task_sem]
> [Sat Mar 1 02:39:52 2025] full_proxy_read+0xfc/0x1d0
> [Sat Mar 1 02:39:52 2025] vfs_read+0x1a0/0x858
> [Sat Mar 1 02:39:52 2025] ksys_read+0x100/0x220
> [Sat Mar 1 02:39:52 2025] __arm64_sys_read+0x78/0xc8
> [Sat Mar 1 02:39:52 2025] invoke_syscall+0xd8/0x278
> [Sat Mar 1 02:39:52 2025] el0_svc_common.constprop.0+0xb8/0x298
> [Sat Mar 1 02:39:52 2025] do_el0_svc+0x4c/0x88
> [Sat Mar 1 02:39:52 2025] el0_svc+0x44/0x108
> [Sat Mar 1 02:39:52 2025] el0t_64_sync_handler+0x134/0x160
> [Sat Mar 1 02:39:52 2025] el0t_64_sync+0x1b8/0x1c0
> [Sat Mar 1 02:39:52 2025] INFO: task cat:1437 blocked on a semaphore likely last held by task cat:1436
> [Sat Mar 1 02:39:52 2025] task:cat state:S stack:0 pid:1436 tgid:1436 ppid:1007 task_flags:0x400000 flags:0x00000004
> [Sat Mar 1 02:39:52 2025] Call trace:
> [Sat Mar 1 02:39:52 2025] __switch_to+0x1ec/0x380 (T)
> [Sat Mar 1 02:39:52 2025] __schedule+0xc30/0x44f8
> [Sat Mar 1 02:39:52 2025] schedule+0xb8/0x3b0
> [Sat Mar 1 02:39:52 2025] schedule_timeout+0xf4/0x208
> [Sat Mar 1 02:39:52 2025] msleep_interruptible+0x70/0x130
> [Sat Mar 1 02:39:52 2025] read_dummy+0x48/0xa0 [hung_task_sem]
> [Sat Mar 1 02:39:52 2025] full_proxy_read+0xfc/0x1d0
> [Sat Mar 1 02:39:52 2025] vfs_read+0x1a0/0x858
> [Sat Mar 1 02:39:52 2025] ksys_read+0x100/0x220
> [Sat Mar 1 02:39:52 2025] __arm64_sys_read+0x78/0xc8
> [Sat Mar 1 02:39:52 2025] invoke_syscall+0xd8/0x278
> [Sat Mar 1 02:39:52 2025] el0_svc_common.constprop.0+0xb8/0x298
> [Sat Mar 1 02:39:52 2025] do_el0_svc+0x4c/0x88
> [Sat Mar 1 02:39:52 2025] el0_svc+0x44/0x108
> [Sat Mar 1 02:39:52 2025] el0t_64_sync_handler+0x134/0x160
> [Sat Mar 1 02:39:52 2025] el0t_64_sync+0x1b8/0x1c0
>
> [1] https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@xxxxxxxxxxxxxxxxxxxxxxxxxxxx
>
> Signed-off-by: Mingzhe Yang <mingzhe.yang@xxxxxx>
> Signed-off-by: Lance Yang <ioworker0@xxxxxxxxx>
> ---
> include/linux/sched.h | 1 +
> include/linux/semaphore.h | 15 ++++++++++-
> kernel/hung_task.c | 52 ++++++++++++++++++++++++++-----------
> kernel/locking/semaphore.c | 53 ++++++++++++++++++++++++++++++++++----
> 4 files changed, 100 insertions(+), 21 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 0cebdd736d44..5dfdca879ac4 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1219,6 +1219,7 @@ struct task_struct {
>
> #ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> struct mutex *blocker_mutex;
> + struct semaphore *blocker_sem;
Can we make this a union because only one blocker is active at a time?
Waiman has been suggested the update way;
https://lore.kernel.org/all/9f9150b4-1cf5-4380-b431-419f70775a7d@xxxxxxxxxx/
If we can use MSB or LSB of those pointers, it is the smallest memory
footprint.
> #endif
>
> #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
> diff --git a/include/linux/semaphore.h b/include/linux/semaphore.h
> index 04655faadc2d..ca8240a5dbfc 100644
> --- a/include/linux/semaphore.h
> +++ b/include/linux/semaphore.h
> @@ -16,13 +16,25 @@ struct semaphore {
> raw_spinlock_t lock;
> unsigned int count;
> struct list_head wait_list;
> +
> +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> + atomic_long_t last_holder;
> +#endif
Would we need to make this atomic? I think mutex needs to use it directly
as a pointer, but debug_show_blocker() searches the task, so we can
use an unsigned long + WRITE_ONCE()/READ_ONCE().
> };
>
> +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> +#define __LAST_HOLDER_SEMAPHORE_INITIALIZER \
> + , .last_holder = ATOMIC_LONG_INIT(0)
> +#else
> +#define __LAST_HOLDER_SEMAPHORE_INITIALIZER
> +#endif
> +
> #define __SEMAPHORE_INITIALIZER(name, n) \
> { \
> .lock = __RAW_SPIN_LOCK_UNLOCKED((name).lock), \
> .count = n, \
> - .wait_list = LIST_HEAD_INIT((name).wait_list), \
> + .wait_list = LIST_HEAD_INIT((name).wait_list) \
> + __LAST_HOLDER_SEMAPHORE_INITIALIZER \
> }
>
> /*
> @@ -47,5 +59,6 @@ extern int __must_check down_killable(struct semaphore *sem);
> extern int __must_check down_trylock(struct semaphore *sem);
> extern int __must_check down_timeout(struct semaphore *sem, long jiffies);
> extern void up(struct semaphore *sem);
> +extern unsigned long sem_last_holder(struct semaphore *sem);
>
> #endif /* __LINUX_SEMAPHORE_H */
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index ccd7217fcec1..332f555a97a9 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -98,30 +98,52 @@ static struct notifier_block panic_block = {
> static void debug_show_blocker(struct task_struct *task)
> {
> struct task_struct *g, *t;
> - unsigned long owner;
> - struct mutex *lock;
> + unsigned long owner, holder;
> + struct semaphore *sem_lock;
> + struct mutex *mutex_lock;
>
> RCU_LOCKDEP_WARN(!rcu_read_lock_held(), "No rcu lock held");
>
> - lock = READ_ONCE(task->blocker_mutex);
> - if (!lock)
> - return;
> + mutex_lock = READ_ONCE(task->blocker_mutex);
> + if (mutex_lock) {
> + owner = mutex_get_owner(mutex_lock);
> + if (unlikely(!owner)) {
> + pr_err("INFO: task %s:%d is blocked on a mutex, but the owner is not found.\n",
> + task->comm, task->pid);
> + goto blocker_sem;
> + }
>
> - owner = mutex_get_owner(lock);
> - if (unlikely(!owner)) {
> - pr_err("INFO: task %s:%d is blocked on a mutex, but the owner is not found.\n",
> - task->comm, task->pid);
> + /* Ensure the owner information is correct. */
> + for_each_process_thread(g, t) {
> + if ((unsigned long)t == owner) {
> + pr_err("INFO: task %s:%d is blocked on a mutex likely owned by task %s:%d.\n",
> + task->comm, task->pid, t->comm, t->pid);
> + sched_show_task(t);
> + return;
> + }
> + }
> return;
> }
>
> - /* Ensure the owner information is correct. */
> - for_each_process_thread(g, t) {
> - if ((unsigned long)t == owner) {
> - pr_err("INFO: task %s:%d is blocked on a mutex likely owned by task %s:%d.\n",
> - task->comm, task->pid, t->comm, t->pid);
> - sched_show_task(t);
> +blocker_sem:
> + sem_lock = READ_ONCE(task->blocker_sem);
> + if (sem_lock) {
> + holder = sem_last_holder(sem_lock);
> + if (unlikely(!holder)) {
> + pr_err("INFO: task %s:%d is blocked on a semaphore, but the last holder is not found.\n",
> + task->comm, task->pid);
> return;
> }
> +
> + for_each_process_thread(g, t) {
> + if ((unsigned long)t == holder) {
> + pr_err("INFO: task %s:%d blocked on a semaphore likely last held by task %s:%d\n",
> + task->comm, task->pid, t->comm, t->pid);
> + sched_show_task(t);
> + return;
> + }
> + }
> + return;
> }
> }
> #else
> diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c
> index 34bfae72f295..5a684c0a3087 100644
> --- a/kernel/locking/semaphore.c
> +++ b/kernel/locking/semaphore.c
> @@ -39,6 +39,7 @@ static noinline int __down_interruptible(struct semaphore *sem);
> static noinline int __down_killable(struct semaphore *sem);
> static noinline int __down_timeout(struct semaphore *sem, long timeout);
> static noinline void __up(struct semaphore *sem);
> +static inline void __sem_acquire(struct semaphore *sem);
>
> /**
> * down - acquire the semaphore
> @@ -58,7 +59,7 @@ void __sched down(struct semaphore *sem)
> might_sleep();
> raw_spin_lock_irqsave(&sem->lock, flags);
> if (likely(sem->count > 0))
> - sem->count--;
> + __sem_acquire(sem);
> else
> __down(sem);
> raw_spin_unlock_irqrestore(&sem->lock, flags);
> @@ -82,7 +83,7 @@ int __sched down_interruptible(struct semaphore *sem)
> might_sleep();
> raw_spin_lock_irqsave(&sem->lock, flags);
> if (likely(sem->count > 0))
> - sem->count--;
> + __sem_acquire(sem);
> else
> result = __down_interruptible(sem);
> raw_spin_unlock_irqrestore(&sem->lock, flags);
> @@ -109,7 +110,7 @@ int __sched down_killable(struct semaphore *sem)
> might_sleep();
> raw_spin_lock_irqsave(&sem->lock, flags);
> if (likely(sem->count > 0))
> - sem->count--;
> + __sem_acquire(sem);
> else
> result = __down_killable(sem);
> raw_spin_unlock_irqrestore(&sem->lock, flags);
> @@ -139,7 +140,7 @@ int __sched down_trylock(struct semaphore *sem)
> raw_spin_lock_irqsave(&sem->lock, flags);
> count = sem->count - 1;
> if (likely(count >= 0))
> - sem->count = count;
> + __sem_acquire(sem);
> raw_spin_unlock_irqrestore(&sem->lock, flags);
>
> return (count < 0);
> @@ -164,7 +165,7 @@ int __sched down_timeout(struct semaphore *sem, long timeout)
> might_sleep();
> raw_spin_lock_irqsave(&sem->lock, flags);
> if (likely(sem->count > 0))
> - sem->count--;
> + __sem_acquire(sem);
> else
> result = __down_timeout(sem, timeout);
> raw_spin_unlock_irqrestore(&sem->lock, flags);
> @@ -242,10 +243,18 @@ static inline int __sched __down_common(struct semaphore *sem, long state,
> {
> int ret;
>
> +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> + WRITE_ONCE(current->blocker_sem, sem);
> +#endif
> +
> trace_contention_begin(sem, 0);
> ret = ___down_common(sem, state, timeout);
> trace_contention_end(sem, ret);
>
> +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> + WRITE_ONCE(current->blocker_sem, NULL);
> +#endif
> +
> return ret;
> }
>
> @@ -274,6 +283,40 @@ static noinline void __sched __up(struct semaphore *sem)
> struct semaphore_waiter *waiter = list_first_entry(&sem->wait_list,
> struct semaphore_waiter, list);
> list_del(&waiter->list);
> +
> +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> + atomic_long_cmpxchg_release(&sem->last_holder, (unsigned long)current,
> + 0UL);
Ah, you don't need atomic here because __up() is called with sem->lock locked.
Also, if wait_list is empty, the 'up()' does only sem->count++. In this case,
the owner is not cleared.
Thank you,
> +#endif
> +
> waiter->up = true;
> wake_up_process(waiter->task);
> }
> +
> +static inline struct task_struct *__holder_task(unsigned long holder)
> +{
> + return (struct task_struct *)holder;
> +}
> +
> +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> +/* Do not use the return value as a pointer directly. */
> +unsigned long sem_last_holder(struct semaphore *sem)
> +{
> + unsigned long holder = atomic_long_read(&sem->last_holder);
> +
> + return (unsigned long)__holder_task(holder);
> +}
> +#else
> +unsigned long sem_last_holder(struct semaphore *sem)
> +{
> + return 0;
> +}
> +#endif
> +
> +static inline void __sem_acquire(struct semaphore *sem)
> +{
> + sem->count--;
> +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> + atomic_long_set(&sem->last_holder, (unsigned long)current);
> +#endif
> +}
> --
> 2.45.2
>
>
--
Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>