Re: Is it really safe to use workqueues to drive expedited grace periods?

From: Tejun Heo
Date: Fri Feb 10 2017 - 21:36:00 EST


Hello, Paul.

On Fri, Feb 10, 2017 at 01:21:58PM -0800, Paul E. McKenney wrote:
> So RCU's expedited grace periods have been using workqueues for a
> little while, and things seem to be working. But as usual, I worry...
> Is this use subject to some sort of deadlock where RCU's workqueue cannot
> start running until after a grace period completes, but that grace
> period is the one needing the workqueue? Note that there are ways to
> set up your kernel so that all RCU grace periods are expedited.
>
> Should I be worried? If not, what prevents this from being a problem,
> especially given that workqueue handlers are allowed to wait for RCU
> grace periods to complete?

A per-cpu (normal) workqueue's concurrency is regulated automatically
so that there are at least one worker running for the worker pool on a
given CPU.

Let's say there are two work items queued on a workqueue. The first
one is something which will do synchronize_rcu() and the second is the
expedited grace period work item. When the first one runs
synchronize_rcu(), it'd block. If there are no other work items
running at the time, workqueue will dispatch another worker so that
there's at least one actively running, which in this case will be the
expedited rcu grace period work item.

The dispatching of a new worker can be delayed by two things - memory
pressure preventing creation of a new worker and the workqueue hitting
maximum concurrency limit.

If expedited RCU grace period is something that memory reclaim path
may depend on, the workqueue that it executes on should have
WQ_MEM_RECLAIM set, which will guarantee that there's at least one
worker (across all CPUs) which is ready to serve the work items on
that workqueue regardless of memory pressure.

The latter, concurrency limit, would only matter if the RCU work items
use system_wq. system_wq's concurrency limit is very high (512 per
CPU), but it is theoretically possible to fill all up with work items
doing synchronize_rcu() with the expedited RCU work item scheduled
behind it. The system would already be in a very messed up state
outside the RCU situation tho.

Thanks.

--
tejun