Re: [RFC][PATCH] rcu classic: new algorithm for callbacks-processing

From: Paul E. McKenney
Date: Mon Jun 23 2008 - 05:14:13 EST

Next message: Matti Linnanvuori: "[patch v1.2.30] wan: add driver retina"
Previous message: Huang, Ying: "Re: [PATCH] x86 boot: Pass E820 memory map entries more than 128via linked list of setup data"
In reply to: Lai Jiangshan: "Re: [RFC][PATCH] rcu classic: new algorithm for callbacks-processing"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, Jun 23, 2008 at 11:25:38AM +0800, Lai Jiangshan wrote:
>
> I apologize for for so later response. I do not stop this works.
> But some problems occurred when i tested. (Actually, i wanted to reply
> you after all are fixed, my fault!)

No problem -- as long as things are progressing, I am happy! ;-)

> Paul E. McKenney wrote:
> > On Tue, Jun 03, 2008 at 11:46:11AM +0800, Lai Jiangshan wrote:
> >> The code/algorithm of the implement of current callbacks-processing
> >> is very efficient and technical. But when I studied it and I found
> >> a disadvantage:
> >>
> >> In multi-CPU systems, when a new RCU callback is being
> >> queued(call_rcu[_bh]), this callback will be invoked after the grace
> >> period for the batch with batch number = rcp->cur+2 has completed
> >> very very likely in current implement. Actually, this callback can be
> >> invoked after the grace period for the batch with
> >> batch number = rcp->cur+1 has completed. The delay of invocation means
> >> that latency of synchronize_rcu() is extended. But more important thing
> >> is that the callbacks usually free memory, and these works are delayed
> >> too! it's necessary for reclaimer to free memory as soon as
> >> possible when left memory is few.
> >
> > Speeding up the RCU grace periods would indeed be a very good thing!
> >
> >> A very simple way can solve this problem:
> >> a field(struct rcu_head::batch) is added to record the batch number for
> >> the RCU callback. And when a new RCU callback is being queued, we
> >> determine the batch number for this callback(head->batch = rcp->cur+1)
> >> and we move this callback to rdp->donelist if we find
> >> that head->batch <= rcp->completed when we process callbacks.
> >> This simple way reduces the wait time for invocation a lot. (about
> >> 2.5Grace Period -> 1.5Grace Period in average in multi-CPU systems)
> >>
> >> This is my algorithm. But I do not add any field for struct rcu_head
> >> in my implement. We just need to memorize the last 2 batches and
> >> their batch number, because these 2 batches include all entries that
> >> for whom the grace period hasn't completed. So we use a special
> >> linked-list rather than add a field.
> >> Please see the comment of struct rcu_data.
> >
> > Maintaining the single list with multiple pointers into it certainly
> > does seem to simplify the list processing, as does extracting the common
> > code from call_rcu() and call_rcu_bh(). Just out of curiosity, why
> > did you keep donelist as a separate list instead of an additional pointer
> > into the mxtlist?
>
> donelist is only accessed in softirq(do not need irq disabled),
> but nxtlist is not. i didn't want to modify rcu_do_batch().

OK, we can always handle this separately if it makes sense.

> >> rcutourture was tested successfully(x86_64/4cpu i386/2cpu i386/1cpu).
> >
> > Of course, RCU implementations need careful inspection, testing and
> > validation. Running rcutorture is a good first step, but unfortunately
> > only a first step. So I need to ask you the following questions:
> >
> > 1. How long did you run rcutorture?
>
> 2 hours at the first time i run rcutorture , but no hotplug nor
> test_no_idle_hz argument. How long would be appropriate?

I would suggest 24 hours for a core change like this.

> > 2. Do you have access to weak-memory-order machines on which
> > to do rcutorture testing? (If not, I expect that we can
> > motivate testing elsewhere.)
>
> I can't access to weak-memory-order machines. Could you please
> test it after all my test are OK?

I would be very happy to! Let me know when you are ready, and send
me a patch against some published version (e.g., one of the ones that
is run by test.kernel.org).

> > 3. Did you run CPU hotplug while running rcutorture? Doing so
> > is extremely important, as RCU interacts with CPU hotplug.
>
> failed with the following script is run at the same time,
> i hasn't found out the reason:
> #!/bin/sh
>
> # 4cpus
>
> cpu1=1
> cpu2=1
> cpu3=1
> while ((1))
> do
> no=$(($RANDOM % 3 + 1))
> if ((!cpu$no))
> then
> echo 1 > /sys/devices/system/cpu/cpu$no/online
> ((cpu$no=1))
> else
> echo 0 > /sys/devices/system/cpu/cpu$no/online
> ((cpu$no=0))
> fi
> echo 1 $cpu1 $cpu2 $cpu3
> sleep 2
> done

You tried this without your changes and it passed, correct?

Never forget to test the base kernel. ;-)

> > 4. Did you use the rcutorture test_no_idle_hz and shuffle_interval
> > arguments to test out RCU's interaction with CONFIG_NO_HZ?
> > (This requires running a CONFIG_NO_HZ kernel.)
>
> test is OK with test_no_idle_hz=1 shuffle_interval=5.

Very good!!! For how long?

> It seems my patch changes nothing about NO_HZ.

Agreed, but changes can have unanticipated side-effects, so it is
always good to check.

> > 5. One concern I have is the removal of a few memory barriers.
> > Could you please tell me why it is safe to remove these?
>
> Yes, it is safe, it's may delay the processing a little
> when read the old/error values for rcp->cur/rcp->next_pending.
> I had fixed it. But it may still delay the processing when old value
> for rcp->completed is read in rcu_pending().

Would you please write down why you believe it is safe? It is not that
I doubt your ability, it is just that RCU implementations can be a bit
tricky at times. It will help me check your code if I fully understand
your thinking.

> > Could you please run any additional combinations of tests that you
> > are able to given the hardware you have access to?
>
> Yes, i will test and i want more advice.

And, as noted earlier, I will be happy to fill in testing on weak-memory
machines once it is working well on the machines you have access to.

> > And thank you very much for all your work in simplifying and speeding
> > up RCU grace-period detection! There may be some additional work
> > required, but this patch does look promising!
> >
> > Thanx, Paul
> >
>
> How can I test to find out whether a patch of rcu
> advances system's performance?

By running a number of benchmarks. You might want to check boot-up
and shutdown speed as well.

> I didn't changed any code for batch's grace period. I just
> insert callbacks into the right batch to speeding up their grace periods
> in SMP.
>
> And I think broadcasting when a new batch is started will
> speed up batch's grace period.

It might well, and speeding up the grace period would be a good thing.
However, it will be necessary to test overhead -- and we might need some
help from the SGI guys for their very large machines. (I have some
moderately large ones I can get access to from time to time, but the SGI
machines are the biggest I am aware of.)

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Matti Linnanvuori: "[patch v1.2.30] wan: add driver retina"
Previous message: Huang, Ying: "Re: [PATCH] x86 boot: Pass E820 memory map entries more than 128via linked list of setup data"
In reply to: Lai Jiangshan: "Re: [RFC][PATCH] rcu classic: new algorithm for callbacks-processing"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]