Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads
From: Pavan Kondeti
Date: Mon Aug 06 2018 - 04:37:46 EST
Hi Prasad,
On Wed, Aug 01, 2018 at 01:07:03AM -0700, Sodagudi Prasad wrote:
> On 2018-07-30 14:07, Peter Zijlstra wrote:
> >On Mon, Jul 30, 2018 at 10:12:43AM -0700, Sodagudi Prasad wrote:
> >>How about including below change as well? Currently, there is
> >>no way to
> >>identify thread migrations completed or not. When we observe
> >>this issue,
> >>the symptom was work queue lock up. It is better to have some
> >>timeout here
> >>and induce the bug_on.
> >
> >You'd trigger the soft-lockup or hung-task detector I think. And
> >if not,
> >we ought to look at making it trigger at least one of those.
> >
> >>There is no way to identify the migration threads stuck or not.
> >
> >Should be pretty obvious from the splat generated by the above, no?
> Hi Peter and Thomas,
>
> Thanks for your support.
> I have another question on this flow and retry mechanism used in
> this cpu_stop_queue_two_works() function using the global variable
> stop_cpus_in_progress.
>
> This variable is getting used in various paths, such as task
> migration, set task affinity, and CPU hotplug.
>
> For example cpu hotplug path, stop_cpus_in_progress variable getting
> set with true with out checking.
> takedown_cpu()
> --stop_machine_cpuslocked()
> ---stop_cpus()
> ---__stop_cpus()
> ----queue_stop_cpus_work()
> setting stop_cpus_in_progress to true directly.
>
> But in the task migration path only, the stop_cpus_in_progress
> variable is used for retry.
>
> I am thinking that stop_cpus_in_progress variable lead race
> conditions, where CPU hotplug and task migration happening
> simultaneously. Please correct me If my understanding wrong.
>
The stop_cpus_in_progress variable is to guard against out of order queuing.
The stopper locks does not protect this when cpu_stop_queue_two_works() and
stop_cpus() are executing in parallel.
stop_one_cpu_{nowait} functions are called to handle affinity change and
load balance. Since we are queuing the work only on 1 CPU,
stop_cpus_in_progress variable protection is not needed.
Thanks,
Pavan
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.