Re: [PATCH v5 2/3] hung_task: show the blocker task if the task is hung on semaphore

From: Lance Yang
Date: Sat Aug 23 2025 - 03:49:40 EST




On 2025/8/23 12:47, Lance Yang wrote:
Hi Finn,

On 2025/8/23 08:27, Finn Thain wrote:

On Sat, 23 Aug 2025, Lance Yang wrote:


include/linux/hung_task.h-/*
include/linux/hung_task.h- * @blocker: Combines lock address and blocking type.
include/linux/hung_task.h- *
include/linux/hung_task.h- * Since lock pointers are at least 4-byte aligned(32-bit) or 8-byte
include/linux/hung_task.h- * aligned(64-bit). This leaves the 2 least bits (LSBs) of the pointer
include/linux/hung_task.h- * always zero. So we can use these bits to encode the specific blocking
include/linux/hung_task.h- * type.
include/linux/hung_task.h- *

That comment was introduced in commit e711faaafbe5 ("hung_task: replace
blocker_mutex with encoded blocker"). It's wrong and should be fixed.

Right, the problematic assumption was introduced in that commit ;)


include/linux/hung_task.h- * Type encoding:
include/linux/hung_task.h- * 00 - Blocked on mutex
   (BLOCKER_TYPE_MUTEX)
include/linux/hung_task.h- * 01 - Blocked on semaphore
   (BLOCKER_TYPE_SEM)
include/linux/hung_task.h- * 10 - Blocked on rw-semaphore as READER
   (BLOCKER_TYPE_RWSEM_READER)
include/linux/hung_task.h- * 11 - Blocked on rw-semaphore as WRITER
   (BLOCKER_TYPE_RWSEM_WRITER)
include/linux/hung_task.h- */
include/linux/hung_task.h-#define BLOCKER_TYPE_MUTEX            0x00UL
include/linux/hung_task.h-#define BLOCKER_TYPE_SEM              0x01UL
include/linux/hung_task.h-#define BLOCKER_TYPE_RWSEM_READER     0x02UL
include/linux/hung_task.h-#define BLOCKER_TYPE_RWSEM_WRITER     0x03UL
include/linux/hung_task.h-
include/linux/hung_task.h:#define BLOCKER_TYPE_MASK             0x03UL

On m68k, the minimum alignment of int and larger is 2 bytes.

Ah, thanks, that's good to know! It clearly explains why the
WARN_ON_ONCE() is triggering.

If you want to use the lowest 2 bits of a pointer for your own use,
you must make sure data is sufficiently aligned.

You're right. Apparently I missed that :(

I'm wondering if there's a way to check an architecture's minimum
alignment at compile-time. If so, we could disable this feature on
architectures that don't guarantee 4-byte alignment.


As Geert says, the compiler can give you all the bits you need, so you
won't have to contort your algorithm to fit whatever free bits happen to
be available. Please see for example, commit 258a980d1ec2 ("net: dst:
Force 4-byte alignment of dst_metrics").

Yes, thanks, it's a helpful example!

I see your point that explicitly enforcing alignment is a very clean
solution for the lock structures supported by the blocker tracking
mechanism.

However, I'm thinking about the "principle of minimal impact" here.
Forcing alignment on the core lock types themselves — like struct
semaphore — feels like a broad change to fix an issue that's local to the
hung task detector :)


If not, the fallback is to adjust the runtime checks.


That would be a solution to a different problem.

For that reason, I would prefer to simply adjust the runtime checks within
the hung task detector. It feels like a more generic and self-contained
solution. It works out-of-the-box for the majority of architectures and
provides a safe fallback for those that aren't.

Happy to hear what you and others think about this trade-off. Perhaps
there's a perspective I'm missing ;)

Anyway, I've prepared two patches for discussion, either of which should
fix the alignment issue :)

Patch A[1] adjusts the runtime checks to handle unaligned pointers.
Patch B[2] enforces 4-byte alignment on the core lock structures.

Both tested on x86-64.

[1] https://lore.kernel.org/lkml/20250823050036.7748-1-lance.yang@xxxxxxxxx
[2] https://lore.kernel.org/lkml/20250823074048.92498-1-lance.yang@xxxxxxxxx

Thanks,
Lance