Re: [PATCH 1/2] workqueue: skip lockdep wq dependency in cancel_work_sync()

From: Byungchul Park
Date: Wed Aug 22 2018 - 08:48:15 EST




On 08/22/2018 06:42 PM, Johannes Berg wrote:
On Wed, 2018-08-22 at 18:15 +0900, Byungchul Park wrote:
thread 1 thread 2 (wq thread)

common_case = false;
queue_work(&my_wq, &work);
mutex_lock(&mutex);

flush_workqueue(&my_wq);
work_function()
-> mutex_lock(&mutex);
-> schedule(), wait for mutex
wait_for_completion()

-> deadlock - we can't make any forward progress here.
I see. Now, I understand what you are talking about.

if (common_case)
schedule_and_wait_for_something_that_takes_a_long_time()

should be:

if (common_case)
schedule_and_wait_long_time_so_that_the_work_to_finish()
Fair point.

Ok. I didn't know what you are talking about but now I got it.
great.

You are talking about who knows whether common_case is true or not,
so we must aggresively consider the case the wait_for_completion()
so far hasn't been called but may be called in the future.
Yes.

I think it's a problem of how aggressively we need to check dependencies.
If we choose the aggressive option, then we could get reported
aggressively but could not avoid aggresive false positives either.
I don't want to strongly argue that because it's a problem of policy.
I don't think you could consider a report from "aggressive reporting" to
be a false positive. It's clearly possible that this happens, and once
you go to a workqueue you basically don't really know your sequencing
and timing any more.

Just, I would consider only waits that actually happened anyway. Of
course, we gotta consider the waits definitely even if any actual
deadlock doesn't happen since detecting potantial problem is more
important than doing on actual ones as you said.

So no big objection to check dependencies aggressively.

Here we don't have a deadlock, but without the revert we will also not
You misunderstand me. The commit should be reverted or acquire/release
pairs should be added in both flush functions.
Ok, I thought you were arguing we shouldn't revert it :)

I don't know whether to revert or just add the pairs in the flush
functions, because I can't say I understand what the third part of the
patch does.

Anyway the annotation should be placed in the path where
wait_for_completion() might be called.
Yes, it should be called regardless of whether we actually wait or not,
i.e. before most of the checking in the functions.

My issue #3 that I outlined is related - we'd like to have lockdep
understand that if this work was on the WQ it might deadlock, but we
don't have a way to get the WQ unless the work is scheduled - and in the
case that it is scheduled we might already hitting the deadlock
(depending on the order of execution though I guess).

Absolutly true. You can find my opinion about that in
Documentation/locking/crossrelease.txt which has been removed because
crossrelease is strong at detecting *potential* deadlock problems.
Ok, you know what, I'm going to read this now ... hang on........

But very sorry for poor English on it in advance...

The document should have gotten contributed by ones who
are good at English. It might be too naive for you to read.


I see. You were trying to solve the problem of completions/context
transfers in lockdep.

Given the revert of crossrelease though, we can't actually revert your
patch anyway, and looking at it again I see what you mean by the "name"
now ...

So yeah, we should only re-add the two acquire/release pairs. I guess
I'll make a patch for that, replacing my patch 2.

I'm intrigued by the crossrelease - but that's a separate topic.

johannes