Re: [RFC v2 0/2] swait: add idle to make idle-hacks on kthreads explicit

From: Paul E. McKenney
Date: Mon Jun 19 2017 - 13:55:10 EST


On Fri, Jun 16, 2017 at 03:37:54PM -0500, Eric W. Biederman wrote:
> "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> writes:
>
> > On Fri, Jun 16, 2017 at 01:26:19AM +0200, Luis R. Rodriguez wrote:
> >> On Thu, Jun 15, 2017 at 02:57:17PM -0700, Paul E. McKenney wrote:
> >> > On Thu, Jun 15, 2017 at 11:48:18AM -0700, Luis R. Rodriguez wrote:
> >> > > While reviewing RCU's interruptible swaits I noticed signals were actually
> >> > > not expected. Paul explained that the reason signals are not expected is
> >> > > we use kthreads, which don't get signals, furthermore the code avoided the
> >> > > uninterruptible swaits as otherwise it would contribute to the system load
> >> > > average on idle, bumping it from 0 to 2 or 3 (depending on preemption).
> >> > >
> >> > > Since this can be confusing its best to be explicit about the requirements and
> >> > > goals. This patch depends on the other killable swaits [0] recently proposed as
> >> > > well interms of context. Thee patch can however be tested independently if
> >> > > the hunk is addressed separately.
> >> > >
> >> > > [0] https://lkml.kernel.org/r/20170614222017.14653-3-mcgrof@xxxxxxxxxx
> >> >
> >> > Tested-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> >> >
> >> > Are you looking to push these or were you wanting me to?
> >>
> >> I'd be happy for you to take them.
> >
> > OK, let's see if we can get some Acked-by's or Reviewed-by's from the
> > relevant people.
> >
> > For but one example, Eric, does this look good to you or are adjustments
> > needed?
>
> Other than an unnecessary return code I don't see any issues.
>
> Acked-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
>
> In truth I am just barely ahead of you folks. I ran into the same issue
> the other day with a piece of my code and someone pointed me to TASK_IDLE.

;-) ;-) ;-)

And here is an updated version of the second patch. Thoughts?

(The bogus comment was not Luis's fault, but I figured I should fix
it while thinking about it.)

Thanx, Paul

------------------------------------------------------------------------

commit 5877121be4ba90a32298d7a00a678cae5cbb6a82
Author: Luis R. Rodriguez <mcgrof@xxxxxxxxxx>
Date: Thu Jun 15 11:48:20 2017 -0700

rcu: use idle versions of swait to make idle-hack clear

These RCU waits were set to use interruptible waits to avoid the kthreads
contributing to system load average, even though they are not interruptible
as they are spawned from a kthread. Use the new TASK_IDLE swaits which makes
it clear our goal, and removes confusion about these paths possibly being
interruptible -- they are not.

When the system is idle the RCU grace-period kthread will spend all its time
blocked inside the swait_event_interruptible(). If the interruptible() was
not used, then this kthread would contribute to the load average. This means
that an idle system would have a load average of 2 (or 3 if PREEMPT=y),
rather than the load average of 0 that almost fifty years of UNIX has
conditioned sysadms to expect.

The same argument applies to swait_event_interruptible_timeout() use. The
RCU grace-period kthread spends its time blocked inside this call while
waiting for grace periods to complete. In particular, if there was only one
busy CPU, but that CPU was frequently invoking call_rcu(), then the RCU
grace-period kthread would spend almost all its time blocked inside the
swait_event_interruptible_timeout(). This would mean that the load average
would be 2 rather than the expected 1 for the single busy CPU.

Signed-off-by: Luis R. Rodriguez <mcgrof@xxxxxxxxxx>
Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
[ paulmck: Fix indentation and obsolete comment. ]
Acked-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 695fee7cafe0..94ec7455fc46 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2052,8 +2052,8 @@ static bool rcu_gp_init(struct rcu_state *rsp)
}

/*
- * Helper function for wait_event_interruptible_timeout() wakeup
- * at force-quiescent-state time.
+ * Helper function for swait_event_idle() wakeup at force-quiescent-state
+ * time.
*/
static bool rcu_gp_fqs_check_wake(struct rcu_state *rsp, int *gfp)
{
@@ -2191,9 +2191,8 @@ static int __noreturn rcu_gp_kthread(void *arg)
READ_ONCE(rsp->gpnum),
TPS("reqwait"));
rsp->gp_state = RCU_GP_WAIT_GPS;
- swait_event_interruptible(rsp->gp_wq,
- READ_ONCE(rsp->gp_flags) &
- RCU_GP_FLAG_INIT);
+ swait_event_idle(rsp->gp_wq, READ_ONCE(rsp->gp_flags) &
+ RCU_GP_FLAG_INIT);
rsp->gp_state = RCU_GP_DONE_GPS;
/* Locking provides needed memory barrier. */
if (rcu_gp_init(rsp))
@@ -2224,7 +2223,7 @@ static int __noreturn rcu_gp_kthread(void *arg)
READ_ONCE(rsp->gpnum),
TPS("fqswait"));
rsp->gp_state = RCU_GP_WAIT_FQS;
- ret = swait_event_interruptible_timeout(rsp->gp_wq,
+ ret = swait_event_idle_timeout(rsp->gp_wq,
rcu_gp_fqs_check_wake(rsp, &gf), j);
rsp->gp_state = RCU_GP_DOING_FQS;
/* Locking provides needed memory barriers. */