Re: [PATCH V4 05/12] trace/hwlat: Support hotplug operations

From: Daniel Bristot de Oliveira
Date: Tue Jun 22 2021 - 10:05:43 EST


On 6/21/21 7:46 PM, Steven Rostedt wrote:
> On Mon, 21 Jun 2021 18:14:36 +0200
> Daniel Bristot de Oliveira <bristot@xxxxxxxxxx> wrote:
>
>>>> Yep! I tried to take the trace_type_lock here, and got the lockdep info about
>>>> this problem.
>>>>
>>>>> The only thing I could think of is to wake up a worker thread to do the
>>>>> work. That is, this just wakes the worker thread, then the worker grabs
>>>>> the trace_types_lock, iterates through the cpu mask of expect running
>>>>> threads, and then starts or kills them depending on the hwlat_busy
>>>>> value.
>>>> So, it will not wait for the kworker to run?
>>> What wont wait?
>>
>> For example, at the shutdown, should the hotplug callback wait for the workqueue
>> to run & kill the thread, or not?
>
> Doing that won't help the deadlock situation.

yep, that is why I asked... :-(

> CPU 1 CPU 2
> ----- -----
> Start shutdown
> down online_cpus()
>
> mutex_lock(trace_types_lock);
> get_online_cpus()
> [BLOCK]
>
> wake_up_thread;
> [schedule worker]
>
> mutex_lock(trace_types_lock);
>
> [ DEADLOCK ]
>
>
> Make all access to save_cpumask and hwlat_per_cpu_data inside the
> get_online_cpus() protection. (like in move_to_next_cpu(),
> start_single_thread() expand the get_online_cpus()).
>
> Then in the cpu going down case, we can simply kill the thread and
> update the save_cpumask, as it will be protected by the
> get_online_cpus() code.
>
> That is, don't even check if hwlat_busy is set or not. Just simply do:
>
>
> CPU_DOWN:
>
> stop_cpu_kthead(cpu);
>
> That will stop the kthread if it is running. But we should update
> that function to also set per_cpu(hwlat_per_cpu_data).kthread = NULL;
> Like stop_single_kthread() does.
>
> But for CPU_UP, we should do the work via a worker thread.
>
> CPU_UP:
> schedule_work_on(&update_kthreads, cpu);
>
> Which in the work function for that update_kthreads work queue:
>
> mutex_lock(&trace_types_lock);
> if (!hwlat_busy || hwlat_data.thread_mode != MODE_PER_CPU)
> goto out_unlock;
>
> get_online_cpus();
> if (!this_cpu(hwlat_per_cpu_data).kthread)
> start_per_cpu_kthread(smp_processor_id());
> put_online_cpus();
>
> out_unlock:
> mutex_unlock(&trace_types_lock);
>
> Or something like that.

It works!

I will send a v5 with all the fixes requested, including this one.

Thanks Steven!
-- Daniel