Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context

From: Joel Fernandes
Date: Sun May 08 2022 - 02:36:10 EST


On Sat, May 7, 2022 at 6:32 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
>
> On Sat, May 07, 2022 at 11:11:58AM +0200, Uladzislau Rezki wrote:
> > > On Fri, May 06, 2022 at 06:22:26PM +0200, Uladzislau Rezki wrote:
> > > > > On Thu, May 05, 2022 at 12:16:41PM +0200, Uladzislau Rezki (Sony) wrote:
> > > > > > Introduce a RCU_NOCB_CPU_CB_BOOST kernel option. So a user can
> > > > > > decide if an offloading has to be done in a high-prio context or
> > > > > > not. Please note an option depends on RCU_NOCB_CPU and RCU_BOOST
> > > > > > parameters and by default it is off.
> > > > > >
> > > > > > This patch splits the boosting preempted RCU readers and those
> > > > > > kthreads which directly responsible for driving expedited grace
> > > > > > periods forward with enabling/disabling the offloading from/to
> > > > > > SCHED_FIFO/SCHED_OTHER contexts.
> > > > > >
> > > > > > The main reason of such split is, for example on Android there
> > > > > > are some workloads which require fast expedited grace period to
> > > > > > be done whereas offloading in RT context can lead to starvation
> > > > > > and hogging a CPU for a long time what is not acceptable for
> > > > > > latency sensitive environment. For instance:
> > > > > >
> > > > > > <snip>
> > > > > > <...>-60 [006] d..1 2979.028717: rcu_batch_start: rcu_preempt CBs=34619 bl=270
> > > > > > <snip>
> > > > > >
> > > > > > invoking 34 619 callbacks will take time thus making other CFS
> > > > > > tasks waiting in run-queue to be starved due to such behaviour.
> > > > > >
> > > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@xxxxxxxxx>
> > > > >
> > > > > All good points!
> > > > >
> > > > > Some questions and comments below.
> > > > >
> > > > > Adding Sebastian on CC for his perspective.
> > > > >
> > > > > Thanx, Paul
> > > > >
> > > > > > ---
> > > > > > kernel/rcu/Kconfig | 14 ++++++++++++++
> > > > > > kernel/rcu/tree.c | 5 ++++-
> > > > > > kernel/rcu/tree_nocb.h | 3 ++-
> > > > > > 3 files changed, 20 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
> > > > > > index 27aab870ae4c..074630b94902 100644
> > > > > > --- a/kernel/rcu/Kconfig
> > > > > > +++ b/kernel/rcu/Kconfig
> > > > > > @@ -275,6 +275,20 @@ config RCU_NOCB_CPU_DEFAULT_ALL
> > > > > > Say Y here if you want offload all CPUs by default on boot.
> > > > > > Say N here if you are unsure.
> > > > > >
> > > > > > +config RCU_NOCB_CPU_CB_BOOST
> > > > > > + bool "Perform offloading from real-time kthread"
> > > > > > + depends on RCU_NOCB_CPU && RCU_BOOST
> > > > > > + default n
> > > > >
> > > > > I understand that you need this to default to "n" on your systems.
> > > > > However, other groups already using callback offloading should not see
> > > > > a sudden change. I don't see an Android-specific defconfig file, but
> > > > > perhaps something in drivers/android/Kconfig?
> > > > >
> > We saw a sudden change when the priority was lifted up for rcuop kthreads.
> > I would like to know the reason. As for Android, i would like to avoid
> > it to be Android specific. It is better just to enable boosting by
> > default for nocb kthreads.
>
> No, because that breaks an existing use case, which uses RCU_BOOST
> to avoid OOM on busy systems.
>
> > > > > One easy way to make this work would be to invert the sense of this
> > > > > Kconfig option ("RCU_NOCB_CB_NO_BOOST"?), continue having it default to
> > > > > "n", but then select it somewhere in drivers/android/Kconfig. But I
> > > > > would not be surprised if there is a better way.
> >
> > In that situation probably we should just enable it by default.
>
> You are within your rights to cause it to be enabled by default -within-
> -Android-. You are -not- within your rights to break other workloads.
>
> If ChromeOS needs it too, they too can enable it -within- -ChromeOS-.
>
> It is not -that- hard, guys! ;-)

I think on the topic of RT, +Steven Rostedt should chime in as well
considering he wrote a good chunk of the RT scheduler ;-). Personally,
I feel the issue of "rcu callback offload" threads running as RT or
not should not be a matter of CONFIG option or the system in concern.
Instead it should be a function of how many callbacks there are to
run. The reason I say this is, RT threads should not be doing a lot
of work anyway, lest they cause RT throttling and starvation of other
threads.

Also, I think it is wrong to assume that a certain kind of system will
always have a certain number of callbacks to process at a time. That
seems prone to poor design due to assumptions which may not always be
true.

Can we not have 2 sets of RCU offload threads, one which operate at RT
and only process few callbacks at a time, while another which is the
lower priority CFS offload thread - executes whenever there is a lot
of CBs pending? Just a thought.

Otherwise, I feel like we might be again proliferating CONFIG options
and increasing burden on the user to get it the CONFIG right.

thanks,

- Joel