Re: Query regarding srcu_funnel_exp_start()

From: Paul E. McKenney
Date: Fri Oct 27 2017 - 18:20:51 EST


On Fri, Oct 27, 2017 at 10:15:04PM +0530, Neeraj Upadhyay wrote:
> On 10/27/2017 05:56 PM, Paul E. McKenney wrote:
> >On Fri, Oct 27, 2017 at 02:23:07PM +0530, Neeraj Upadhyay wrote:
> >>Hi,
> >>
> >>One query regarding srcu_funnel_exp_start() function in
> >>kernel/rcu/srcutree.c.
> >>
> >>static void srcu_funnel_exp_start(struct srcu_struct *sp, struct
> >>srcu_node *snp,
> >> unsigned long s)
> >>{
> >> <snip>
> >> if (!ULONG_CMP_LT(sp->srcu_gp_seq_needed_exp, s))
> >> sp->srcu_gp_seq_needed_exp = s;
> >> <snip>
> >>}
> >>
> >>Why is sp->srcu_gp_seq_needed_exp set to 's' if srcu_gp_seq_needed_exp is >=
> >>'s'. Shouldn't srcu_gp_seq_needed_exp be equal to the greater of both?
> >
> >Let's suppose that it is incorrect as currently written. Can you
> >construct a test case demonstrating a failure of some sort, then provide
> >a fix?
>
> Will check this. Might take some time to build a test case.

Fair enough!

I suggest checking to see if kernel/rcu/rcuperf.c can do what you need for
this test. (Might not with a single test, but perhaps a before-and-after
comparison. Or maybe you really do need to add some test code somewhere.)

> >To start with, if it is currently incorrect, what would be the nature
> >of the failure?
> >
> > Thanx, Paul
> >
>
> Hi Paul,
>
> I see below scenario, where new gp won't be expedited. Please correct
> me if I am missing something here.
>
> 1. CPU0 calls synchronize_srcu_expedited()
>
> synchronize_srcu_expedited()
> __synchronize_srcu()
> __call_srcu()
> s = rcu_seq_snap(&sp->srcu_gp_seq); // lets say
> srcu_gp_seq = 0;
> // s = 0x100

Looks like you have one hex digit and then two binary digits, but why not?
(RCU_SEQ_STATE_MASK is 3 rather than 0xff.)

> sdp->srcu_gp_seq_needed = s // 0x100
> needgp = true
> sdp->srcu_gp_seq_needed_exp = s // 0x100
> srcu_funnel_gp_start()
> sp->srcu_gp_seq_needed_exp = s;
> srcu_gp_start(sp);
> rcu_seq_start(&sp->srcu_gp_seq);
>
> 2. CPU1 calls normal synchronize_srcu()
>
> synchronize_srcu()
> __synchronize_srcu(sp, true)
> __call_srcu()
> s = rcu_seq_snap(&sp->srcu_gp_seq); // srcu_gp_seq = 1
> // s= 0x200
> sdp->srcu_gp_seq_needed = s; // 0x200
> srcu_funnel_gp_start()
> smp_store_release(&sp->srcu_gp_seq_needed, s); // 0x200
>
> 3. CPU3 calls synchronize_srcu_expedited()
>
> synchronize_srcu_expedited()
> __synchronize_srcu()
> __call_srcu()
> s = rcu_seq_snap(&sp->srcu_gp_seq); // srcu_gp_seq = 1
> // s = 0x200
> sdp->srcu_gp_seq_needed_exp = s // 0x200
> srcu_funnel_exp_start(sp, sdp->mynode, s);
> // sp->srcu_gp_seq_needed_exp = 0x100
> // s = 0x200 ; sp->srcu_gp_seq_needed_exp is not updated
> if (!ULONG_CMP_LT(sp->srcu_gp_seq_needed_exp, s))
> sp->srcu_gp_seq_needed_exp = s;

Seems plausible, but you should be able to show the difference in
grace-period duration with a test.

While you are in srcu_funnel_exp_start(), should it be rechecking
rcu_seq_done(&sp->srcu_gp_seq, s) as well as the current
ULONG_CMP_GE(snp->srcu_gp_seq_needed_exp, s) under the lock?
Why or why not?

Thanx, Paul