Re: [PATCH] tracing/osnoise: fix potential deadlock in cpu hotplug

From: Steven Rostedt

Date: Wed Mar 25 2026 - 10:35:54 EST


On Wed, 25 Mar 2026 10:25:42 +0800 (CST)
<hu.shengming@xxxxxxxxxx> wrote:

> >On Tue, 24 Mar 2026 15:06:16 +0800 (CST)
> ><hu.shengming@xxxxxxxxxx> wrote:
> >
> >> From: luohaiyang10243395 <luo.haiyang@xxxxxxxxxx>
> >>
> >> The following sequence may leads deadlock in cpu hotplug:
> >>
> >> CPU0 | CPU1
> >> | schedule_work_on
> >> |
> >> _cpu_down//set CPU1 offline |
> >> cpus_write_lock |
> >> | osnoise_hotplug_workfn
> >> | mutex_lock(&interface_lock);
> >> | cpus_read_lock(); //wait cpu_hotplug_lock
> >> |
> >> | cpuhp/1
> >> | osnoise_cpu_die
> >> | kthread_stop
> >> | wait_for_completion //wait osnoise/1 exit
> >> |
> >> | osnoise/1
> >> | osnoise_sleep
> >> | mutex_lock(&interface_lock); //deadlock
> >>
> >> Fix by swap the order of cpus_read_lock() and mutex_lock(&interface_lock).
> >
> >So the deadlock is due to the "wait_for_completion"?
>
> The osnoise_cpu_init callback returns directly, which may allow another CPU offline task to run,
> the offline task holds the cpu_hotplug_lock while waiting for the osnoise task to exit.
> osnoise_hotplug_workfn may acquire interface_lock first, causing the offline task to be blocked.
> This is an ABBA deadlock.

Right, as I said, it is due to the "wait_for_completion" and not due to two
different locks. One is waiting for the osnoise task to exit (the
"wait_for_completion") but the osnoise task is blocked on the interface_lock().

Better to show it as:


task1 task2 task3
----- ----- -----

mutex_lock(&interface_lock)

[CPU GOING OFFLINE]

cpus_write_lock();
osnoise_cpu_die();
kthread_stop(task3);
wait_for_completion();

osnoise_sleep();
mutex_lock(&interface_lock);

cpus_read_lock();

[DEAD LOCK]

>
> >How did you find this bug? Inspection, AI, triggered?
> >
> >Thanks,
> >
> >-- Steve
>
> We run autotests on kernel-6.6, report following hung task warning, and we think the same issue exists
> in linux-stable.

Thanks. It's usually good to state how a bug was discovered when fixing it.

Could you send a v2 with an updated change log?

-- Steve