Re: [PATCH] rcu: Use cpus_read_lock() while looking at cpu_online_mask

From: Sebastian Andrzej Siewior
Date: Tue Sep 11 2018 - 12:21:50 EST


On 2018-09-11 09:05:32 [-0700], Paul E. McKenney wrote:
> On Mon, Sep 10, 2018 at 03:56:16PM +0200, Sebastian Andrzej Siewior wrote:
> > It was possible that sync_rcu_exp_select_cpus() enqueued something on
> > CPU0 while CPU0 was offline. Such a work item wouldn't be processed
> > until CPU0 gets back online. This problem was addressed in commit
> > fcc6354365015 ("rcu: Make expedited GPs handle CPU 0 being offline"). I
> > don't think the issue fully addressed.
> >
> > Assume grplo = 0 and grphi = 7 and sync_rcu_exp_select_cpus() is invoked
> > on CPU1. The preempt_disable() section on CPU1 won't ensure that CPU0
> > remains online between looking at cpu_online_mask and invoking
> > queue_work_on() on CPU1.
> >
> > Use cpus_read_lock() to ensure that `cpu' is not going down between
> > looking at cpu_online_mask at invoking queue_work_on() and waiting for
> > its completion. It is added around the loop + flush_work() which is
> > similar to work_on_cpu_safe() (and we can have multiple jobs running on
> > NUMA systems).
>
> Is this experimental or theoretical?

theoretical. I saw that hunk on RT and I can't have queue_work() within
a preempt_disable() section here.

> If theoretical, the counter-theory
> is that the stop-machine processing prevents any of the cpu_online_mask
> bits from changing, though, yes, we would like to get rid of the
> stop-machine processing. So either way, yes, the current state could
> use some improvement.
>
> But one problem with the patch below is that sync_rcu_exp_select_cpus()
> can be called while the cpu_hotplug_lock is write-held. Or is that
> somehow OK these days?

depends. Is it okay to wait until the write-lock is dropped? If it is,
then it is okay. If notâ

> Assuming not, how about the (untested) patch
> below?

Doesn't work for me because it is still within the preempt-disable
section :/.
Would it work to use WORK_CPU_UNBOUND? As far as I understand it, the
CPU number does not matter, you just want to spread it across multiple
CPUs in the NUMA case.

> Thanx, Paul
>
> commit 5214cbbfe6a5d6b92c76c4e411a049fe57245d4a
> Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> Date: Tue Sep 11 08:57:48 2018 -0700
>
> rcu: Stop expedited grace periods from relying on stop-machine
>
> The CPU-selection code in sync_rcu_exp_select_cpus() disables preemption
> to prevent the cpu_online_mask from changing. However, this relies on
> the stop-machine mechanism in the CPU-hotplug offline code, which is not
> desirable (it would be good to someday remove the stop-machine mechanism).

not that I tested it, but I still don't understand how a
preempt_disable() section on CPU1 can ensure that CPU3 won't go down. Is
there some code that invokes stop_cpus() for each CPU or so?

Sebastian