Re: [PATCH] rcu: Use cpus_read_lock() while looking at cpu_online_mask

From: Paul E. McKenney
Date: Tue Sep 11 2018 - 13:02:35 EST


On Tue, Sep 11, 2018 at 06:21:42PM +0200, Sebastian Andrzej Siewior wrote:
> On 2018-09-11 09:05:32 [-0700], Paul E. McKenney wrote:
> > On Mon, Sep 10, 2018 at 03:56:16PM +0200, Sebastian Andrzej Siewior wrote:
> > > It was possible that sync_rcu_exp_select_cpus() enqueued something on
> > > CPU0 while CPU0 was offline. Such a work item wouldn't be processed
> > > until CPU0 gets back online. This problem was addressed in commit
> > > fcc6354365015 ("rcu: Make expedited GPs handle CPU 0 being offline"). I
> > > don't think the issue fully addressed.
> > >
> > > Assume grplo = 0 and grphi = 7 and sync_rcu_exp_select_cpus() is invoked
> > > on CPU1. The preempt_disable() section on CPU1 won't ensure that CPU0
> > > remains online between looking at cpu_online_mask and invoking
> > > queue_work_on() on CPU1.
> > >
> > > Use cpus_read_lock() to ensure that `cpu' is not going down between
> > > looking at cpu_online_mask at invoking queue_work_on() and waiting for
> > > its completion. It is added around the loop + flush_work() which is
> > > similar to work_on_cpu_safe() (and we can have multiple jobs running on
> > > NUMA systems).
> >
> > Is this experimental or theoretical?
>
> theoretical. I saw that hunk on RT and I can't have queue_work() within
> a preempt_disable() section here.

OK, I feel better now. ;-)

> > If theoretical, the counter-theory
> > is that the stop-machine processing prevents any of the cpu_online_mask
> > bits from changing, though, yes, we would like to get rid of the
> > stop-machine processing. So either way, yes, the current state could
> > use some improvement.
> >
> > But one problem with the patch below is that sync_rcu_exp_select_cpus()
> > can be called while the cpu_hotplug_lock is write-held. Or is that
> > somehow OK these days?
>
> depends. Is it okay to wait until the write-lock is dropped? If it is,
> then it is okay. If notâ

The problem is that people wait for grace periods within CPU hotplug
notifiers, so the lock won't be dropped until long after that notifier
returns.

> > Assuming not, how about the (untested) patch
> > below?
>
> Doesn't work for me because it is still within the preempt-disable
> section :/.
> Would it work to use WORK_CPU_UNBOUND? As far as I understand it, the
> CPU number does not matter, you just want to spread it across multiple
> CPUs in the NUMA case.

Locality is a good thing, but yes, something like this?

if (!IS_ENABLED(CONFIG_PREEMPT_RT) && /* or whatever it is called */
unlikely(cpu > rnp->grphi - rnp->grplo))

Another approach that might be better longer term would be to have a
workqueue interface that treats the specified CPU as a suggestion,
and silently switches to WORK_CPU_UNBOUND if there is any problem
whatsoever with the specified CPU. Tejun, Lai, thoughts?

> > commit 5214cbbfe6a5d6b92c76c4e411a049fe57245d4a
> > Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> > Date: Tue Sep 11 08:57:48 2018 -0700
> >
> > rcu: Stop expedited grace periods from relying on stop-machine
> >
> > The CPU-selection code in sync_rcu_exp_select_cpus() disables preemption
> > to prevent the cpu_online_mask from changing. However, this relies on
> > the stop-machine mechanism in the CPU-hotplug offline code, which is not
> > desirable (it would be good to someday remove the stop-machine mechanism).
>
> not that I tested it, but I still don't understand how a
> preempt_disable() section on CPU1 can ensure that CPU3 won't go down. Is
> there some code that invokes stop_cpus() for each CPU or so?

Yes, there is a stop_machine_cpuslocked() in takedown_cpu().

There is also a synchronize_rcu_mult(call_rcu, call_rcu_sched) in
sched_cpu_deactivate().

The old code relies on the stop_machine_cpuslocked() and my proposed
patch relies on the synchronize_rcu_mult().

Thanx, Paul