Re: [PATCH tip/core/rcu 16/27] rcu: Add comment documenting how rcu_seq_snap works

From: Joel Fernandes
Date: Wed Jun 27 2018 - 14:27:34 EST


On Wed, Jun 27, 2018 at 10:54:36AM -0700, Paul E. McKenney wrote:
> On Tue, Jun 26, 2018 at 09:39:13PM -0700, Joel Fernandes wrote:
> > On Tue, Jun 26, 2018 at 07:30:55PM +0200, Peter Zijlstra wrote:
> > > On Mon, Jun 25, 2018 at 05:35:02PM -0700, Paul E. McKenney wrote:
> > > > +/*
> > > > + * rcu_seq_snap - Take a snapshot of the update side's sequence number.
> > > > + *
> > > > + * This function returns the earliest value of the grace-period sequence number
> > > > + * that will indicate that a full grace period has elapsed since the current
> > > > + * time. Once the grace-period sequence number has reached this value, it will
> > > > + * be safe to invoke all callbacks that have been registered prior to the
> > > > + * current time. This value is the current grace-period number plus two to the
> > > > + * power of the number of low-order bits reserved for state, then rounded up to
> > > > + * the next value in which the state bits are all zero.
> > >
> > > If you complete that by saying _why_ you need to round up there, then
> > > the below verbiage is completely redundant.
> > >
> > > > + * In the current design, RCU_SEQ_STATE_MASK=3 and the least significant bit of
> > > > + * the seq is used to track if a GP is in progress or not. Given this, it is
> > > > + * sufficient if we add (6+1) and mask with ~3 to get the next GP. Let's see
> > > > + * why with an example:
> > > > + *
> > > > + * Say the current seq is 12 which is 0b1100 (GP is 3 and state bits are 0b00).
> > > > + * To get to the next GP number of 4, we have to add 0b100 to this (0x1 << 2)
> > > > + * to account for the shift due to 2 state bits. Now, if the current seq is
> > > > + * 13 (GP is 3 and state bits are 0b01), then it means the current grace period
> > > > + * is already in progress so the next GP that a future call back will be queued
> > > > + * to run at is GP+2 = 5, not 4. To account for the extra +1, we just overflow
> > > > + * the 2 lower bits by adding 0b11. In case the lower bit was set, the overflow
> > > > + * will cause the extra +1 to the GP, along with the usual +1 explained before.
> > > > + * This gives us GP+2. Finally we mask the lower to bits by ~0x3 in case the
> > > > + * overflow didn't occur. This masking is needed because in case RCU was idle
> > > > + * (no GP in progress so lower 2 bits are 0b00), then the overflow of the lower
> > > > + * 2 state bits wouldn't occur, so we mask to zero out those lower 2 bits.
> > > > + *
> > > > + * In other words, the next seq can be obtained by (0b11 + 0b100) & (~0b11)
> > > > + * which can be generalized to:
> > > > + * seq + (RCU_SEQ_STATE_MASK + (RCU_SEQ_STATE_MASK + 1)) & (~RCU_SEQ_STATE_MASK)
> > > > + */
> > >
> > > Is the below not much simpler:
> > >
> > > > static inline unsigned long rcu_seq_snap(unsigned long *sp)
> > > > {
> > > > unsigned long s;
> > >
> > > s = smp_load_aquire(sp);
> > >
> > > /* Add one GP */
> > > s += 1 << RCU_SEQ_CTR_SHIFT;
> > >
> > > /* Complete any pending state by rounding up */
> >
> > I would suggest this comment be changed to "Add another GP if there was a
> > pending state".
> >
> > > s = __ALIGN_MASK(s, RCU_SEQ_STATE_MASK);
> > >
> >
> > I agree with Peter's suggestions for both the verbiage reduction in the
> > comments in the header, as the new code he is proposing is more
> > self-documenting. I believe I proposed a big comment just because the code
> > wasn't self-documenting or obvious previously so needed an explanation.
> >
> > How would you like to proceed? Let me know what you guys decide, I am really
> > Ok with anything. If you guys agree, should I write a follow-up patch with
> > Peter's suggestion that applies on top of this one? Or do we want to drop
> > this one in favor of Peter's suggestion?
>
> Shortening the comment would be good, so please do that.
>
> I cannot say that I am much of a fan of the suggested change to the
> computation, but I don't feel all that strongly about it. If the two

Did you mean a code generation standpoint or from a higher level coding standpoint?

>From a code generation perspective, the code is identical, I did a quick
test to confirm that:

0000000000000000 <rcu_seq_snap_old>:
0: e8 00 00 00 00 callq 5 <rcu_seq_snap_old+0x5>
5: 48 8b 07 mov (%rdi),%rax
8: f0 83 44 24 fc 00 lock addl $0x0,-0x4(%rsp)
e: 48 83 c0 07 add $0x7,%rax
12: 48 83 e0 fc and $0xfffffffffffffffc,%rax
16: c3 retq
17: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
1e: 00 00

0000000000000020 <rcu_seq_snap_new>:
20: e8 00 00 00 00 callq 25 <rcu_seq_snap_new+0x5>
25: 48 8b 07 mov (%rdi),%rax
28: f0 83 44 24 fc 00 lock addl $0x0,-0x4(%rsp)
2e: 48 83 c0 07 add $0x7,%rax
32: 48 83 e0 fc and $0xfffffffffffffffc,%rax
36: c3 retq
37: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
3e: 00 00

> of you agree on a formulation and get at least one other RCU maintainer
> or reviewer to agree as well, I will take the change.
>

Cool, sounds good.

thanks!

- Joel