Re: [PATCH v4] hung_task: Deduplicate identical hang reports using explicit blocker tracking

From: Aaron Tomlin

Date: Sun Jun 28 2026 - 16:56:55 EST

On Sun, Jun 28, 2026 at 12:47:50PM +0800, Lance Yang wrote:
> Sorry, but NACK from me.
>
> Aaron, please slow down a bit here!
>
> I already said in v2 that the discussion didn't feel settled, and asked
> you to wait before another spin. v3 didn't settle it either ...
>
> Replying to comments is *not* the same as settling review ... If any
> reviewer/maintainer still doesn't agree this should go in, please read
> the room a bit and stop spining new versions until we agree on the next
> move.
>
> If v3 didn't settle it (i.e. we didn't agree on the next move), please
> don't just spin v4. Step back first: what real problem does this solve?
> If this isn't really a real-world problem at all, why keep spinning it?
>
> At that point it starts looking less like review and more like patch
> churn. That usually doesnt land, it just *burns* reviewer time ...
>
> I'd rather spend review time on patches where we at least agree the
> problem is real. I assume you'd prefer that too :)
>
> So yeah, still a NACK from me. If new versions keep coming before the
> discussion settles, I'll keep NACKing them ...
>
> Thanks, Lance

Hi Lance,

I completely understand your frustration, and I am sorry for rushing out
the revisions too quickly. I will certainly hold off on sending any further
versions until we have reached a collective agreement on the path forward.

To step back and address your question directly regarding the real-world
problem this patch aims to solve:

In large-scale, multi-tenant, production environments, lock contention is a
frequent reality. When a core resource (e.g., a heavily contended rwsem or
mutex) blocks, it does not just hang one task; it causes a cascading
failure that halts hundreds of tasks simultaneously.

When khungtaskd runs its scan during such an outage, it often reports
identical stack traces into the kernel ring buffer, which is not entirely
useful.

The global sysctl_hung_task_warnings budget is instantly exhausted by a
single lock storm. Consequently, the kernel is left entirely blind to
subsequent, completely unrelated deadlocks occurring elsewhere in the
system hours later.

The changes introduced to date, moving away from the heuristic wchan
approach to a more deterministic t->blocker tracking as per Petr's
feedback, were an attempt to solve this without introducing complex
heuristics or dangerous blind spots.

Kind regards,
--
Aaron Tomlin