Re: [PATCH 7/7] RCU, workqueue: Implement rcu_work

From: Paul E. McKenney
Date: Wed Mar 07 2018 - 09:53:49 EST


On Wed, Mar 07, 2018 at 10:49:49AM +0800, Lai Jiangshan wrote:
> On Wed, Mar 7, 2018 at 1:33 AM, Tejun Heo <tj@xxxxxxxxxx> wrote:
>
> > +/**
> > + * queue_rcu_work_on - queue work on specific CPU after a RCU grace period
> > + * @cpu: CPU number to execute work on
> > + * @wq: workqueue to use
> > + * @rwork: work to queue
>
> For many people, "RCU grace period" is clear enough, but not ALL.
>
> So please make it a little more clear that it just queues work after
> a *Normal* RCU grace period. it supports only one RCU variant.
>
>
> > + *
> > + * Return: %false if @work was already on a queue, %true otherwise.
> > + */
>
> I'm afraid this will be a hard-using API.
>
> The user can't find a plan B when it returns false, especially when
> the user expects the work must be called at least once again
> after an RCU grace period.
>
> And the error-prone part of it is that, like other queue_work() functions,
> the return value of it is often ignored and makes the problem worse.
>
> So, since workqueue.c provides this API, it should handle this
> problem. For example, by calling call_rcu() again in this case, but
> everything will be much more complex: a synchronization is needed
> for "calling call_rcu() again" and allowing the work item called
> twice after the last queue_rcu_work() is not workqueue style.

I confess that I had not thought of this aspect, but doesn't the
rcu_barrier() in v2 of this patchset guarantee that it has passed
the RCU portion of the overall wait time? Given that, what I am
missing is now this differs from flush_work() on a normal work item.

So could you please show me the sequence of events that leads to this
problem? (Again, against v2 of this patch, which replaces the
synchronize_rcu() with rcu_barrier().)

> Some would argue that the delayed_work has the same problem
> when the user expects the work must be called at least once again
> after a period of time. But time interval is easy to detect, the user
> can check the time and call the queue_delayed_work() again
> when needed which is also a frequent design pattern. And
> for rcu, it is hard to use this design pattern since it is hard
> to detect (new) rcu grace period without using call_rcu().
>
> I would not provide this API. it is not a NACK. I'm just trying
> expressing my thinking about the API. I'd rather RCU be changed
> and RCU callbacks are changed to be sleepable. But a complete
> overhaul cleanup on the whole source tree for compatibility
> is needed at first, an even more complex job.

One downside of allowing RCU callback functions to sleep is that
one poorly written callback can block a bunch of other ones.
One advantage of Tejun's approach is that such delays only affect
the workqueues, which are already designed to handle such delays.

Thanx, Paul

> > +bool queue_rcu_work_on(int cpu, struct workqueue_struct *wq,
> > + struct rcu_work *rwork)
> > +{
> > + struct work_struct *work = &rwork->work;
> > +
> > + if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
> > + rwork->wq = wq;
> > + rwork->cpu = cpu;
> > + call_rcu(&rwork->rcu, rcu_work_rcufn);
> > + return true;
> > + }
> > +
> > + return false;
> > +}
> > +EXPORT_SYMBOL(queue_rcu_work_on);
> > +
>