Re: [PATCH v2 rcu 04/18] rcu: Weaken ->dynticks accesses and updates

From: Paul E. McKenney
Date: Wed Jul 28 2021 - 14:46:09 EST


On Wed, Jul 28, 2021 at 10:58:08AM -0700, Linus Torvalds wrote:
> On Wed, Jul 28, 2021 at 10:37 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> >
> > +/*
> > + * Increment the current CPU's rcu_data structure's ->dynticks field
> > + * with ordering. Return the new value.
> > + */
> > +static noinstr unsigned long rcu_dynticks_inc(int incby)
> > +{
> > + return arch_atomic_add_return(incby, this_cpu_ptr(&rcu_data.dynticks));
> > +}
>
> Maybe inline?

I bet the compiler does this already, but why give it the choice?
I will add inline.

> But more I reacted to how we sadly don't have percpu atomics. They'd
> be fairly easy to add on x86, but I guess it's not a huge deal.
>
> And hey, if this is pretty much the only place that would use them, I
> guess that's not much of an issue.

I did a git grep for "atomic_.*this_cpu_ptr" in -rcu, and came up with
this:

arch/s390/kernel/time.c: atomic_t *sw_ptr = this_cpu_ptr(&clock_sync_word);
arch/s390/kernel/time.c: atomic_t *sw_ptr = this_cpu_ptr(&clock_sync_word);
arch/x86/xen/spinlock.c: atomic_t *nest_cnt = this_cpu_ptr(&xen_qlock_wait_nest);
drivers/irqchip/irq-apple-aic.c: atomic_andnot(irq_bit, this_cpu_ptr(&aic_vipi_enable));
drivers/irqchip/irq-apple-aic.c: atomic_or(irq_bit, this_cpu_ptr(&aic_vipi_enable));
drivers/irqchip/irq-apple-aic.c: if (atomic_read(this_cpu_ptr(&aic_vipi_flag)) & irq_bit)
drivers/irqchip/irq-apple-aic.c: enabled = atomic_read(this_cpu_ptr(&aic_vipi_enable));
drivers/irqchip/irq-apple-aic.c: firing = atomic_fetch_andnot(enabled, this_cpu_ptr(&aic_vipi_flag)) & enabled;
kernel/events/core.c: if (atomic_read(this_cpu_ptr(&perf_cgroup_events)))
kernel/events/core.c: if (atomic_read(this_cpu_ptr(&perf_cgroup_events)))
kernel/rcu/rcuscale.c: atomic_dec(this_cpu_ptr(&n_async_inflight));
kernel/rcu/rcuscale.c: if (rhp && atomic_read(this_cpu_ptr(&n_async_inflight)) < gp_async_max) {
kernel/rcu/rcuscale.c: atomic_inc(this_cpu_ptr(&n_async_inflight));
kernel/rcu/tree.c: return arch_atomic_add_return(incby, this_cpu_ptr(&rcu_data.dynticks));
kernel/rcu/tree.c: return !(atomic_read(this_cpu_ptr(&rcu_data.dynticks)) & 0x1);
kernel/trace/ring_buffer.c: if (atomic_inc_return(this_cpu_ptr(&checking)) != 1)
kernel/trace/ring_buffer.c: atomic_dec(this_cpu_ptr(&checking));

Given that this mostly moves text from the first argument to the function
name, I suspect that we would need quite a few instances of any given
atomic operation to make the added lines for all the definitions and
the documentation worth it. Maybe for atomic_read()?

/**
* atomic_read_this_cpu - atomically read from this CPU's variable
* @p: Pointer to per-CPU variable.
*
* Do an atomic_read() from the current CPU's instance of the
* specified per-CPU variable.
*/
#define atomic_read_this_cpu(p) atomic_read(this_cpu_ptr(p))

But atomic_read_this_cpu(&rcu_data.dynticks) isn't all that much shorter
than atomic_read(this_cpu_ptr(&rcu_data.dynticks)).

Thanx, Paul