Re: RCU vs NOHZ
From: Paul E. McKenney
Date: Fri Sep 16 2022 - 03:58:36 EST
On Fri, Sep 16, 2022 at 12:30:34AM +0200, Peter Zijlstra wrote:
> On Thu, Sep 15, 2022 at 12:14:27PM -0700, Paul E. McKenney wrote:
> > On Thu, Sep 15, 2022 at 08:50:44PM +0200, Peter Zijlstra wrote:
> > > On Thu, Sep 15, 2022 at 09:06:00AM -0700, Paul E. McKenney wrote:
> > > > On Thu, Sep 15, 2022 at 10:39:12AM +0200, Peter Zijlstra wrote:
> > > > > Hi,
> > > > >
> > > > > After watching Joel's talk about RCU and idle ticks I was wondering
> > > > > about why RCU doesn't have NOHZ hooks -- that is regular NOHZ, not the
> > > > > NOHZ_FULL stuff.
> > > >
> > > > It actually does, but they have recently moved into the context-tracking
> > > > code, courtesy of Frederic's recent patch series.
> > >
> > > afair that's idle and that is not nohz.
> >
> > For nohz_full CPUs, it does both.
>
> Normal people don't have nohz_full cpus (and shouldn't want any).
To the best of my knowledge at this point in time, agreed. Who knows
what someone will come up with next week? But for people running certain
types of real-time and HPC workloads, context tracking really does handle
both idle and userspace transitions.
> > > > > These deep idle states are only feasible during NOHZ idle, and the NOHZ
> > > > > path is already relatively expensive (which is offset by then mostly
> > > > > staying idle for a long while).
> > > > >
> > > > > Specifically my thinking was that when a CPU goes NOHZ it can splice
> > > > > it's callback list onto a global list (cmpxchg), and then the
> > > > > jiffy-updater CPU can look at and consume this global list (xchg).
> > > > >
> > > > > Before you say... but globals suck (they do), NOHZ already has a fair
> > > > > amount of global state, and as said before, it's offset by the CPU then
> > > > > staying idle for a fair while. If there is heavy contention on the NOHZ
> > > > > data, the idle governor is doing a bad job by selecting deep idle states
> > > > > whilst we're not actually idle for long.
> > > > >
> > > > > The above would remove the reason for RCU to inhibit NOHZ.
> > > > >
> > > > >
> > > > > Additionally; when the very last CPU goes idle (I think we know this
> > > > > somewhere, but I can't reaily remember where) we can insta-advance the
> > > > > QS machinery and run the callbacks before going (NOHZ) idle.
> > > > >
> > > > >
> > > > > Is there a reason this couldn't work? To me this seems like a much
> > > > > simpler solution than the whole rcu-cb thing.
> > > >
> > > > To restate Joel's reply a bit...
> > > >
> > > > Maybe.
> > > >
> > > > Except that we need rcu_nocbs anyway for low latency and HPC applications.
> > > > Given that we have it, and given that it totally eliminates RCU-induced
> > > > idle ticks, how would it help to add cmpxchg-based global offloading?
> > >
> > > Because that nocb stuff isn't default enabled?
> >
> > Last I checked, both RHEL and Fedora were built with CONFIG_RCU_NOCB_CPU=y.
> > And I checked Fedora just now.
> >
> > Or am I missing your point?
>
> I might be missing the point; but why did Joel have a talk if it's all
> default on?
It wasn't enabled for ChromeOS.
When fully enabled, it gave them the energy-efficiency advantages Joel
described. And then Joel described some additional call_rcu_lazy()
changes that provided even better energy efficiency. Though I believe
that the application should also be changed to avoid incessantly opening
and closing that file while the device is idle, as this would remove
-all- RCU work when nearly idle. But some of the other call_rcu_lazy()
use cases would likely remain.
If someone believes that their workload would benefit similarly and they
are running Fedora or RHEL (and last I knew, the SUSE distros as well),
then they can boot with rcu_nocbs=0-N and try it out. No need to further
change RCU until proven otherwise.
Thanx, Paul