Re: RCU vs NOHZ

From: Joel Fernandes
Date: Sat Sep 17 2022 - 09:53:05 EST




On 9/17/2022 9:35 AM, Peter Zijlstra wrote:
> On Fri, Sep 16, 2022 at 02:11:10PM -0400, Joel Fernandes wrote:
>> Hi Peter,
>>
>> On Fri, Sep 16, 2022 at 5:20 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>> [...]
>>>> It wasn't enabled for ChromeOS.
>>>>
>>>> When fully enabled, it gave them the energy-efficiency advantages Joel
>>>> described. And then Joel described some additional call_rcu_lazy()
>>>> changes that provided even better energy efficiency. Though I believe
>>>> that the application should also be changed to avoid incessantly opening
>>>> and closing that file while the device is idle, as this would remove
>>>> -all- RCU work when nearly idle. But some of the other call_rcu_lazy()
>>>> use cases would likely remain.
>>>
>>> So I'm thinking the scheme I outlined gets you most if not all of what
>>> lazy would get you without having to add the lazy thing. A CPU is never
>>> refused deep idle when it passes off the callbacks.
>>>
>>> The NOHZ thing is a nice hook for 'this-cpu-wants-to-go-idle-long-term'
>>> and do our utmost bestest to move work away from it. You *want* to break
>>> affinity at this point.
>>>
>>> If you hate on the global, push it to a per rcu_node offload list until
>>> the whole node is idle and then push it up the next rcu_node level until
>>> you reach the top.
>>>
>>> Then when the top rcu_node is full idle; you can insta progress the QS
>>> state and run the callbacks and go idle.
>>
>> In my opinion the speed brakes have to be applied before the GP and
>> other threads are even awakened. The issue Android and ChromeOS
>> observe is that even a single CB queued every few jiffies can cause
>> work that can be otherwise delayed / batched, to be scheduled in. I am
>> not sure if your suggestions above address that. Does it?
>
> Scheduled how? Is this callbacks doing queue_work() or something?

Way before the callback is even ready to execute, you can rcuog, rcuop,
rcu_preempt threads running to go through the grace period state machine.

> Anyway; the thinking is that by passing off the callbacks on NOHZ, the
> idle CPUs stay idle. By running the callbacks before going full idle,
> all work is done and you can stay idle longer.

But all CPUs idle does not mean grace period is over, you can have a task (at
least on PREEMPT_RT) block in the middle of an RCU read-side critical section
and then all CPUs go idle.

Other than that, a typical flow could look like:

1. CPU queues a callback.
2. CPU then goes idle.
3. Another CPU is running the RCU threads waking up otherwise idle CPUs.
4. Grace period completes and an RCU thread runs a callback.

>> Try this experiment on your ADL system (for fun). Boot to the login
>> screen on any distro,
>
> All my dev boxes are headless :-) I don't thinkt he ADL even has X or
> wayland installed.

Ah, ok. Maybe what you have (like daemons) are already requesting RCU for
something. Android folks had some logger requesting RCU all the time.

>> and before logging in, run turbostat over ssh
>> and observe PC8 percent residencies. Now increase
>> jiffies_till_first_fqs boot parameter value to 64 or so and try again.
>> You may be surprised how much PC8 percent increases by delaying RCU
>> and batching callbacks (via jiffies boot option) Admittedly this is
>> more amplified on ADL because of package-C-states, firmware and what
>> not, and isn’t as much a problem on Android; but still gives a nice
>> power improvement there.
>
> I can try; but as of now turbostat doesn't seem to work on that thing at
> all. I think localyesconfig might've stripped a required bit. I'll poke
> at it later.

Cool! I believe Len Brown can help on that , or maybe there is another way you
can read the counters to figure out the PC8% and RAPL power.

thanks,

- Joel