Re: [PATCH] tracing/osnoise: Fix possible recursive locking for cpus_read_lock()

From: Ran Xiaokai
Date: Mon Mar 17 2025 - 08:29:05 EST


>On Wed, 26 Feb 2025 03:42:53 +0000
>Ran Xiaokai <ranxiaokai627@xxxxxxx> wrote:
>
>> >> @@ -2105,7 +2104,12 @@ static void osnoise_hotplug_workfn(struct
>> >> work_struct *dummy)
>> >> if (!cpumask_test_cpu(cpu, &osnoise_cpumask))
>> >> return;
>> >>
>> >> - start_kthread(cpu);
>> >> + if (start_kthread(cpu)) {
>> >> + cpus_read_unlock();
>> >> + stop_per_cpu_kthreads();
>> >> + return;
>> >
>> >If all you want to do is to unlock before calling stop_per_cpu_kthreads(),
>> >then this should simply be:
>> >
>> > if (start_kthread(cpu)) {
>> > cpus_read_unlock();
>> > stop_per_cpu_kthreads();
>> > cpus_read_lock(); // The guard() above will unlock this
>> > return;
>> > }
>>
>> This is the deadlock senario:
>> start_per_cpu_kthreads()
>> cpus_read_lock(); // first lock call
>> start_kthread(cpu)
>> ... kthread_run_on_cpu() fails:
>> if (IS_ERR(kthread)) {
>> stop_per_cpu_kthreads(); {
>> cpus_read_lock(); // second lock call. Cause the AA deadlock senario
>> }
>> }
>> stop_per_cpu_kthreads();
>>
>> Besides, stop_per_cpu_kthreads() is called both in start_per_cpu_kthreads() and
>> start_kthread() which is unnecessary.
>>
>> So the fix should be inside start_kthread()?
>> How about this ?
>
>No! You misunderstood what I wrote above.
>
>Instead of removing the guard, keep it!
>
>Do everything the same, but instead of having the error path of:
>
>[..]
> if (start_kthread(cpu)) {
> cpus_read_unlock();
> stop_per_cpu_kthreads();
> return;
> }
> cpus_read_unlock();
> }
>
>Which requires removing the guard. Just do:
>
> if (start_kthread(cpu)) {
> cpus_read_unlock();
> stop_per_cpu_kthreads();
> cpus_read_lock(); // The guard() will unlock this
> }
> }

Hi, Steve
Sorry for the late response.
Yes, this will fix the deadlock issue.

What i mentioned before is that there is a redundant of stop_per_cpu_kthreads()
in start_per_cpu_kthreads().

start_per_cpu_kthreads()
for_each_cpu(cpu, current_mask) {
retval = start_kthread(cpu);
{
if (IS_ERR(kthread))
stop_per_cpu_kthreads(); // first cleanup call of stop_per_cpu_kthreads()
return -ENOMEM;
}
if (retval) {
cpus_read_unlock();
stop_per_cpu_kthreads(); // the redundant call of stop_per_cpu_kthreads()

But the second call will not cause any trouble, So i will send a v2
just according to your suggestion.

>I'm just saying to not replace the guard with open coded locking of
>cpus_read_lock().
>
>-- Steve