Re: [RFC PATCH v15 2/7] locking/mutex: Rework task_struct::blocked_on
From: John Stultz
Date: Wed Mar 19 2025 - 04:55:08 EST
On Tue, Mar 18, 2025 at 3:11 PM Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote:
> On Thu, 13 Mar 2025 23:12:57 -0700
> John Stultz <jstultz@xxxxxxxxxx> wrote:
>
> > On Thu, Mar 13, 2025 at 3:14 AM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> > > FYI, this is useful for Masami's "hung task" work that will show what
> > > tasks a hung task is blocked on in a crash report.
> > >
> > > https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@xxxxxxxxxxxxxxxxxxxxxxxxxxxx/
> > >
> >
> > Ah. Indeed, we have similar use cases. There's some slight difference
> > in when we consider the task blocked, especially in this early patch
> > (as waking tasks mark us as unblocked so we can be selected to run).
> > But later on in the series (in the portions I've not yet submitted
> > here) when the blocked_on_state has been introduced, the blocked_on
> > value approximates to about the same spot as used here.
>
> Interesting. Can yo also track tasks which takes other locks like
> rwsem/semaphore ? Lance is also working on this to expand it to
> support semaphore.
Currently no, proxy-exec is initially just focused on kernel mutexes.
However I do hope to expand it to be usable with other locking
primitives, so something like what Lance is proposing would be needed
for that, so I'm eager to make use of his work.
I've pulled both of your work into my tree and will try to rework my
logic on top.
> BTW, I had a chat with Suleiman and he suggested me to expand
> this idea to record what locks the task takes. Then we can search
> all tasks who is holding the lock. Something like,
>
> struct task_struct {
> unsigned long blocking_on;
> unsigned long holding_locks[HOLDING_LOCK_MAX];
> unsigned int holding_idx;
> };
>
> lock(lock_addr) {
> if (succeeded_to_lock) {
> current->holding_locks[current->holding_idx++] = lock_addr;
> } else {
> record_blocking_on(current, lock_addr)
> wait_for_lock();
> clear_blocking_on(current, lock_addr)
> }
> }
>
> unlock(lock_addr) {
> current->holding_locks[--current->holding_idx] = 0UL;
> }
>
> And when we found a hung task, call dump_blocker() like this;
>
> dump_blocker() {
> lock_addr = hung_task->blocking_on;
> for_each_task(task) {
> if (find_lock(task->holding_locks, lock_addr)) {
> dump_task(task);
> /* semaphore, rwsem will need to dump all holders. */
> if (lock is mutex)
> break;
> }
> }
> }
>
> This can be too much but interesting idea to find semaphore type blocker.
Yeah. I suspect the rw/sem -> owners mapping is a missing piece that
will be needed for proxy-exec, but I've not looked closely yet.
thanks
-john