Re: [PATCH v2 rcu/dev 1/2] rcu/tree: Reduce wake up for synchronize_rcu() common case

From: Joel Fernandes
Date: Tue Mar 19 2024 - 13:33:24 EST




On 3/19/2024 1:29 PM, Joel Fernandes wrote:
>
>
> On 3/19/2024 1:26 PM, Uladzislau Rezki wrote:
>> On Tue, Mar 19, 2024 at 12:11:28PM -0400, Joel Fernandes wrote:
>>> On Tue, Mar 19, 2024 at 12:02 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
>>>>
>>>> On Tue, Mar 19, 2024 at 03:48:46PM +0100, Uladzislau Rezki wrote:
>>>>> On Tue, Mar 19, 2024 at 10:29:59AM -0400, Joel Fernandes wrote:
>>>>>>
>>>>>>
>>>>>>> On Mar 19, 2024, at 5:53 AM, Uladzislau Rezki <urezki@xxxxxxxxx> wrote:
>>>>>>>
>>>>>>> On Mon, Mar 18, 2024 at 05:05:31PM -0400, Joel Fernandes wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>>> On Mar 18, 2024, at 2:58 PM, Uladzislau Rezki <urezki@xxxxxxxxx> wrote:
>>>>>>>>>
>>>>>>>>> Hello, Joel!
>>>>>>>>>
>>>>>>>>> Sorry for late checking, see below few comments:
>>>>>>>>>
>>>>>>>>>> In the synchronize_rcu() common case, we will have less than
>>>>>>>>>> SR_MAX_USERS_WAKE_FROM_GP number of users per GP. Waking up the kworker
>>>>>>>>>> is pointless just to free the last injected wait head since at that point,
>>>>>>>>>> all the users have already been awakened.
>>>>>>>>>>
>>>>>>>>>> Introduce a new counter to track this and prevent the wakeup in the
>>>>>>>>>> common case.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
>>>>>>>>>> ---
>>>>>>>>>> Rebased on paul/dev of today.
>>>>>>>>>>
>>>>>>>>>> kernel/rcu/tree.c | 36 +++++++++++++++++++++++++++++++-----
>>>>>>>>>> kernel/rcu/tree.h | 1 +
>>>>>>>>>> 2 files changed, 32 insertions(+), 5 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>>>>>>>>>> index 9fbb5ab57c84..bd29fe3c76bf 100644
>>>>>>>>>> --- a/kernel/rcu/tree.c
>>>>>>>>>> +++ b/kernel/rcu/tree.c
>>>>>>>>>> @@ -96,6 +96,7 @@ static struct rcu_state rcu_state = {
>>>>>>>>>> .ofl_lock = __ARCH_SPIN_LOCK_UNLOCKED,
>>>>>>>>>> .srs_cleanup_work = __WORK_INITIALIZER(rcu_state.srs_cleanup_work,
>>>>>>>>>> rcu_sr_normal_gp_cleanup_work),
>>>>>>>>>> + .srs_cleanups_pending = ATOMIC_INIT(0),
>>>>>>>>>> };
>>>>>>>>>>
>>>>>>>>>> /* Dump rcu_node combining tree at boot to verify correct setup. */
>>>>>>>>>> @@ -1642,8 +1643,11 @@ static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work)
>>>>>>>>>> * the done tail list manipulations are protected here.
>>>>>>>>>> */
>>>>>>>>>> done = smp_load_acquire(&rcu_state.srs_done_tail);
>>>>>>>>>> - if (!done)
>>>>>>>>>> + if (!done) {
>>>>>>>>>> + /* See comments below. */
>>>>>>>>>> + atomic_dec_return_release(&rcu_state.srs_cleanups_pending);
>>>>>>>>>> return;
>>>>>>>>>> + }
>>>>>>>>>>
>>>>>>>>>> WARN_ON_ONCE(!rcu_sr_is_wait_head(done));
>>>>>>>>>> head = done->next;
>>>>>>>>>> @@ -1666,6 +1670,9 @@ static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work)
>>>>>>>>>>
>>>>>>>>>> rcu_sr_put_wait_head(rcu);
>>>>>>>>>> }
>>>>>>>>>> +
>>>>>>>>>> + /* Order list manipulations with atomic access. */
>>>>>>>>>> + atomic_dec_return_release(&rcu_state.srs_cleanups_pending);
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> /*
>>>>>>>>>> @@ -1673,7 +1680,7 @@ static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work)
>>>>>>>>>> */
>>>>>>>>>> static void rcu_sr_normal_gp_cleanup(void)
>>>>>>>>>> {
>>>>>>>>>> - struct llist_node *wait_tail, *next, *rcu;
>>>>>>>>>> + struct llist_node *wait_tail, *next = NULL, *rcu = NULL;
>>>>>>>>>> int done = 0;
>>>>>>>>>>
>>>>>>>>>> wait_tail = rcu_state.srs_wait_tail;
>>>>>>>>>> @@ -1699,16 +1706,35 @@ static void rcu_sr_normal_gp_cleanup(void)
>>>>>>>>>> break;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> - // concurrent sr_normal_gp_cleanup work might observe this update.
>>>>>>>>>> - smp_store_release(&rcu_state.srs_done_tail, wait_tail);
>>>>>>>>>> + /*
>>>>>>>>>> + * Fast path, no more users to process. Remove the last wait head
>>>>>>>>>> + * if no inflight-workers. If there are in-flight workers, let them
>>>>>>>>>> + * remove the last wait head.
>>>>>>>>>> + */
>>>>>>>>>> + WARN_ON_ONCE(!rcu);
>>>>>>>>>>
>>>>>>>>> This assumption is not correct. An "rcu" can be NULL in fact.
>>>>>>>>
>>>>>>>> Hmm I could never trigger that. Are you saying that is true after Neeraj recent patch or something else?
>>>>>>>> Note, after Neeraj patch to handle the lack of heads availability, it could be true so I requested
>>>>>>>> him to rebase his patch on top of this one.
>>>>>>>>
>>>>>>>> However I will revisit my patch and look for if it could occur but please let me know if you knew of a sequence of events to make it NULL.
>>>>>>>>>
>>>>>>> I think we should agree on your patch first otherwise it becomes a bit
>>>>>>> messy or go with a Neeraj as first step and then work on youth. So, i
>>>>>>> reviewed this patch based on latest Paul's dev branch. I see that Neeraj
>>>>>>> needs further work.
>>>>>>
>>>>>> You are right. So the only change is to drop the warning and those braces. Agreed?
>>>>>>
>>>>> Let me check a bit. Looks like correct but just in case.
>>>>>
>>>>
>>>> Thanks. I was also considering improving it for the rcu == NULL case, as
>>>> below. I will test it more before re-sending.
>>>>
>>>> On top of my patch:
>>>>
>>>> ---8<-----------------------
>>>>
>>>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>>>> index 0df659a878ee..a5ef844835d4 100644
>>>> --- a/kernel/rcu/tree.c
>>>> +++ b/kernel/rcu/tree.c
>>>> @@ -1706,15 +1706,18 @@ static void rcu_sr_normal_gp_cleanup(void)
>>>> break;
>>>> }
>>>>
>>>> +
>>>> + /* Last head stays. No more processing to do. */
>>>> + if (!rcu)
>>>> + return;
>>>> +
>>>
>>> Ugh, should be "if (!wait_head->next)" instead of "if (!rcu)". But
>>> in any case, the original patch except the warning should hold.
>>> Still, I am testing the above diff now.
>>>
>>> - Joel
>>>
>> Just in case, it is based on your patch:
>>
>> <snip>
>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>> index bd29fe3c76bf..98546afe7c21 100644
>> --- a/kernel/rcu/tree.c
>> +++ b/kernel/rcu/tree.c
>> @@ -1711,29 +1711,25 @@ static void rcu_sr_normal_gp_cleanup(void)
>> * if no inflight-workers. If there are in-flight workers, let them
>> * remove the last wait head.
>> */
>> - WARN_ON_ONCE(!rcu);
>> - ASSERT_EXCLUSIVE_WRITER(rcu_state.srs_done_tail);
>> -
>> - if (rcu && rcu_sr_is_wait_head(rcu) && rcu->next == NULL &&
>> - /* Order atomic access with list manipulation. */
>> - !atomic_read_acquire(&rcu_state.srs_cleanups_pending)) {
>> + if (wait_tail->next && rcu_sr_is_wait_head(wait_tail->next) && !wait_tail->next->next &&
>> + !atomic_read_acquire(&rcu_state.srs_cleanups_pending)) {
>
>
> Yes this also works. But also if wait_tail->next == NULL, then you do not need
> to queue worker for that case as well. I sent this as v3.
>
Sorry, I see you did add that later in the patch ;-). I think we have converged
on the final patch then, give or take the use of 'rcu' versus 'wait_tail->next'.

- Joel