Re: [PATCH 3/3] kernel/workqueue: Suppress a false positive lockdep complaint

From: Tejun Heo
Date: Thu Oct 25 2018 - 11:37:03 EST


Hello, Bart.

On Thu, Oct 25, 2018 at 08:05:40AM -0700, Bart Van Assche wrote:
> diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
> index 60d673e15632..375ec764f148 100644
> --- a/include/linux/workqueue.h
> +++ b/include/linux/workqueue.h
> @@ -344,6 +344,7 @@ enum {
> __WQ_ORDERED = 1 << 17, /* internal: workqueue is ordered */
> __WQ_LEGACY = 1 << 18, /* internal: create*_workqueue() */
> __WQ_ORDERED_EXPLICIT = 1 << 19, /* internal: alloc_ordered_workqueue() */
> + __WQ_HAS_BEEN_USED = 1 << 20, /* internal: work has been queued */
>
> WQ_MAX_ACTIVE = 512, /* I like 512, better ideas? */
> WQ_MAX_UNBOUND_PER_CPU = 4, /* 4 * #cpus for unbound wq */
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index fc9129d5909e..0ef275fe526c 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1383,6 +1383,10 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
> if (unlikely(wq->flags & __WQ_DRAINING) &&
> WARN_ON_ONCE(!is_chained_work(wq)))
> return;
> +
> + if (!(wq->flags & __WQ_HAS_BEEN_USED))
> + wq->flags |= __WQ_HAS_BEEN_USED;
> +
> retry:
> if (req_cpu == WORK_CPU_UNBOUND)
> cpu = wq_select_unbound_cpu(raw_smp_processor_id());
> @@ -2889,7 +2893,7 @@ static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr,
> * workqueues the deadlock happens when the rescuer stalls, blocking
> * forward progress.
> */
> - if (!from_cancel &&
> + if (!from_cancel && (pwq->wq->flags & __WQ_HAS_BEEN_USED) &&
> (pwq->wq->saved_max_active == 1 || pwq->wq->rescuer)) {
> lock_acquire_exclusive(&pwq->wq->lockdep_map, 0, 0, NULL,
> _THIS_IP_);

We likely wanna skip the whole drain instead of eliding lockdep
annotation here. Other than that, this patch looks fine to me but for
the others, I think it'd be a better idea to listen to Johannes. We
wanna annotate the users for the exceptions rather than weakening the
workqueue lockdep checks, especially because workqueue related
deadlocks can be pretty difficult to trigger and root cause
afterwards.

Thanks.

--
tejun