Re: [PATCH v3] hung_task: deduplicate identical hang reports

From: Aaron Tomlin

Date: Sat Jun 27 2026 - 16:32:26 EST

On Mon, Jun 22, 2026 at 05:58:54PM +0200, Petr Mladek wrote:
> On Sun 2026-06-21 17:37:56, Aaron Tomlin wrote:
> > Currently, during severe lock contention, multiple tasks can hang while
> > waiting on the exact same resource. The khungtaskd kthread
> > indiscriminately reports every single instance with a stack trace.
> > This can roll the kernel ring buffer and prematurely exhaust the
> > kernel.hung_task_warnings budget. Consequently, the kernel is left
> > entirely blind to subsequent, unrelated deadlocks.
> >
> > To preserve the warning budget and ring buffer without sacrificing
> > observability, introduce a Wait Channel (wchan) and task-state based
> > deduplicator:
> >
> > 1. Implement a lightweight, stack-allocated 64-slot Wait Channel
> > (wchan) hash map. Tasks blocked on the exact same wchan during a
> > single scan are recognised as sharing the same bottleneck,
> > successfully deduplicating contentions even when the callers
> > possess entirely disparate call stacks.
>
> I am sorry but I do not like this. It would show one random task blocked
> using a locking/wait API (mutex, semaphore, wait). But it will not be
> able to distinguish whether they are waiting for the same lock or
> event.
>
> It might easily skip the lock/event which is the root of the problem.
>
> By other words, the motivation for this patch is to avoid duplicated
> backtraces because the global limit of shown backtraces is too low
> and it hides too much. But this would hide even more backtraces.
> As a result administrators and developers will be even more blind.
>
> Honestly, the previous version looked more acceptable to me. The
> problem with not-exactly same backtraces might be solved by
> comparing (hashing) only the top N backtrace levels, e.g. 10th.
> Anyway, we should compare the callers of the locking/waiter API.
>
> IMHO, we should always print backtraces of all hung tasks when
> a hung_task is detected for the 1st time. Because we do not
> know which of the hung tasks is pointing to the root of the problem
> and which is a secondary victim.
>
> Also I would primary try to increase the ring buffer size when
> backtraces get lost.
>
> > 2. Introduce a hung_task_reported bit-field in task_struct. If a task
> > remains hung across multiple intervals, khungtaskd recognises it
> > has already been reported. The bit is safely cleared without
> > locks or atomics the moment the task's context switch counter
> > increments.
>
> Also this looks like an interesting optimization which might help
> to reduce printing the same backtrace again and again. It looks
> much better than the global limit of printed backtraces.
>
> > 3. For duplicate tasks, we still print the single-line
> > "INFO: task ..." message and trigger tracepoint
> > trace_sched_process_hang(). It merely skips calling
> > sched_show_task() and debug_show_blocker(), printing a concise
> > suppression notice instead.
>
> Yes, this is important as well.
>
Hi Petr,

Thank you for the rigorous review. You are absolutely right. Relying on
wchan is far too coarse.

With regard to your suggestion to hash the top N stack levels, I propose
moving away from heuristic execution tracking entirely, and instead
explicitly tracking the data instance.

I intend to conditionally utilise the existing
CONFIG_DETECT_HUNG_TASK_BLOCKER infrastructure. If enabled, khungtaskd will
hash the exact memory address of the bottleneck (t->blocker &
~BLOCKER_TYPE_MASK). Hashing this explicit address guarantees precision:
tasks blocked on the identical lock are accurately grouped, whilst tasks
blocked on different locks are strictly differentiated.

Furthermore, I shall retain the hung_task_reported tracking (now optimised
as a standalone u8) to resolve the temporal log spam issue.

You also noted that we should always print backtraces of all hung tasks
detected for the first time. I completely agree. I have adjusted the logic
so that suppressed duplicate traces will no longer decrement
sysctl_hung_task_warnings. The warning budget will be strictly preserved
for distinct, first-time deadlocks.

I shall send out the v4 patch shortly.

Kind regards,
--
Aaron Tomlin