Re: [PATCH v1] kthread/smpboot: Serialize kthread parking against wakeup

From: Kohli, Gaurav
Date: Tue May 01 2018 - 07:46:28 EST




On 5/1/2018 5:01 PM, Peter Zijlstra wrote:
On Tue, May 01, 2018 at 04:10:53PM +0530, Kohli, Gaurav wrote:
Yes with loop, it will reset TASK_PARKED but that is not happening in the
dumps we have seen.

But was that with or without the fixed wait-loop? I don't care about
stuff you might have seen with the current code, that is clearly broken.

takedown_cpu() can proceed beyond smpboot_park_threads() and kill the
CPU before any of the threads are parked -- per having the complete()
before hitting schedule().

And, afaict, that is harmless. When we go offline, sched_cpu_dying() ->
migrate_tasks() will migrate any still runnable threads off the cpu.
But because at this point the thread must be in the PARKED wait-loop, it
will hit schedule() and go to sleep eventually.

Also note that kthread_unpark() does __kthread_bind() to rebind the
threads.

Aaaah... I think I've spotted a problem there. We clear SHOULD_PARK
before we rebind, so if the thread lost the first PARKED store,
does the completion, gets migrated, cycles through the loop and now
observes !SHOULD_PARK and bails the wait-loop, then __kthread_bind()
will forever wait.


So during next unpark
__kthread_unpark -> __kthread_bind -> wait_task_inactive (this got failed,
as current state is running so failed on below call:

Aah, yes, I seem to have mis-remembered how wait_task_inactive() works.
And it is indeed still a problem..

Let me ponder what the best solution is, it's a bit of a mess.


Sure , Thanks a lot.
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.