Re: [PATCH rcu 04/18] rcu: Weaken ->dynticks accesses and updates

From: Linus Torvalds
Date: Wed Jul 21 2021 - 16:42:10 EST

Next message: Linus Torvalds: "Re: [PATCH] mm: Make kvmalloc refuse to allocate more than 2GB"
Previous message: Thomas Gleixner: "Re: [PATCH 0/6] x86: PIRQ/ELCR-related fixes and updates"
In reply to: Paul E. McKenney: "[PATCH rcu 04/18] rcu: Weaken ->dynticks accesses and updates"
Next in thread: Paul E. McKenney: "Re: [PATCH rcu 04/18] rcu: Weaken ->dynticks accesses and updates"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hmm.

This actually seems to make some of the ordering worse.

I'm not seeing a lot of weakening or optimization, but it depends a
bit on what is common and what is not.

On Wed, Jul 21, 2021 at 1:21 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
>
> +/*
> + * Increment the current CPU's rcu_data structure's ->dynticks field
> + * with ordering. Return the new value.
> + */
> +static noinstr unsigned long rcu_dynticks_inc(int incby)
> +{
> + struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> + int seq;
> +
> + seq = READ_ONCE(rdp->dynticks) + incby;
> + smp_store_release(&rdp->dynticks, seq);
> + smp_mb(); // Fundamental RCU ordering guarantee.
> + return seq;
> +}

So this is actually likely *more* expensive than the old code was, at
least on x86.

The READ_ONCE/smp_store_release are cheap, but then the smp_mb() is expensive.

The old code did just arch_atomic_inc_return(), which included the
memory barrier.

There *might* be some cache ordering advantage to letting the
READ_ONCE() float upwards, but from a pure barrier standpoint this is
more expensive than what we used to have.

> - if (atomic_read(&rdp->dynticks) & 0x1)
> + if (READ_ONCE(rdp->dynticks) & 0x1)
> return;
> - atomic_inc(&rdp->dynticks);
> + rcu_dynticks_inc(1);

And this one seems to not take advantage of the new rule, so we end up
having two reads, and then that potentially more expensive sequence.

> static int rcu_dynticks_snap(struct rcu_data *rdp)
> {
> - return atomic_add_return(0, &rdp->dynticks);
> + smp_mb(); // Fundamental RCU ordering guarantee.
> + return smp_load_acquire(&rdp->dynticks);
> }

This is likely cheaper - not because of barriers, but simply because
it avoids dirtying the cacheline.

So which operation do we _care_ about, and do we have numbers for why
this improves anything? Because looking at the patch, it's not obvious
that this is an improvement.

Linus

Next message: Linus Torvalds: "Re: [PATCH] mm: Make kvmalloc refuse to allocate more than 2GB"
Previous message: Thomas Gleixner: "Re: [PATCH 0/6] x86: PIRQ/ELCR-related fixes and updates"
In reply to: Paul E. McKenney: "[PATCH rcu 04/18] rcu: Weaken ->dynticks accesses and updates"
Next in thread: Paul E. McKenney: "Re: [PATCH rcu 04/18] rcu: Weaken ->dynticks accesses and updates"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]