Re: [PATCH 3/3] kernel/workqueue: Suppress a false positive lockdep complaint

From: Johannes Berg
Date: Thu Oct 25 2018 - 16:00:04 EST


On Thu, 2018-10-25 at 08:55 -0700, Bart Van Assche wrote:

> Please have a look at the call trace in the description of this patch and also
> at the direct I/O code. The lockdep complaint in the description of this patch
> really is a false positive.

I agree. I just don't agree that it's a false positive because of the
lockdep tracking of workqueues (which you're essentially saying since
you want to change it), I think it's a false positive because the user
isn't telling lockdep properly what it's doing.

> What I think needs further discussion is on how to
> address this false positive - track whether or not a work queue has been used
> or follow Tejun's proposal that I became aware of after I posted this patch,
> namely introduce a new function for destroying a work queue that skips draining,
> e.g. destroy_workqueue_skip_drain() (https://lkml.org/lkml/2018/10/24/2).

Agree. Ted's suggestion is basically what I meant in my other email when
I said:

> So, thinking about this more, can you guarantee (somehow) that the
> workqueue is empty at this point?

(I hadn't looked at the code then - obviously that's guaranteed)

> Perhaps then we should encode that
> guarantee into the API, and actually make it WARN_ON() if something is
> scheduled on it at that point? And then skip lockdep in this case.

So that API I'm talking about is what Ted suggests with
destroy_never_used_workqueue() or Tejun's suggestion of
destroy_workqueue_skip_drain(). Either is fine with me, though I'd
probably think it would make sense to actually check that it really was
never used, or at least is actually empty right now and thus doesn't
need the train.

However, I think more generally useful would be the
destroy_workqueue_nested() API I've proposed in my other reply just
moments ago, since that could be used in other code with non-empty
workqueues, as long as you can ensure that there's a guaranteed layering
of them. Then you could even have two objects A and B of the same class,
but separate instances, and flush B's workqueue from a work executing on
A's workqueue, which may come in handy in other places to address false
positives similar to this one.

johannes