Re: [PATCH RFC] select_idle_sibling experiments

From: Mike Galbraith
Date: Mon May 02 2016 - 01:35:37 EST


On Thu, 2016-04-28 at 14:00 +0200, Peter Zijlstra wrote:
> On Wed, Apr 06, 2016 at 09:27:24AM +0200, Mike Galbraith wrote:
> > sched: ratelimit nohz
> >
> > Entering nohz code on every micro-idle is too expensive to bear.
> >
> > Signed-off-by: Mike Galbraith <efault@xxxxxx>
>
> > +int sched_needs_cpu(int cpu)
> > +{
> > +> > > > if (tick_nohz_full_cpu(cpu))
> > +> > > > > > return 0;
> > +
> > +> > > > return cpu_rq(cpu)->avg_idle < sysctl_sched_migration_cost;
>
> So the only problem I have with this patch is the choice of limit. This
> isn't at all tied to the migration cost.
>
> And some people are already twiddling with the migration_cost knob to
> affect the idle_balance() behaviour -- making it much more agressive by
> dialing it down. When you do that you also loose the effectiveness of
> this proposed usage, even though those same people would probably want
> this.
>
> Failing a spot of inspiration for a runtime limit on this; we might have
> to introduce yet another knob :/

sched: ratelimit nohz tick shutdown/restart

Tick shutdown/restart overhead can be substantial when CPUs
enter/exit the idle loop at high frequency. Ratelimit based
upon rq->avg_idle, and provide an adjustment knob.

Signed-off-by: Mike Galbraith <mgalbraith@xxxxxxx>
---
include/linux/sched.h | 5 +++++
include/linux/sched/sysctl.h | 4 ++++
kernel/sched/core.c | 10 ++++++++++
kernel/sysctl.c | 9 +++++++++
kernel/time/tick-sched.c | 2 +-
5 files changed, 29 insertions(+), 1 deletion(-)

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2286,6 +2286,11 @@ static inline int set_cpus_allowed_ptr(s
#ifdef CONFIG_NO_HZ_COMMON
void calc_load_enter_idle(void);
void calc_load_exit_idle(void);
+#ifdef CONFIG_SMP
+extern int sched_needs_cpu(int cpu);
+#else
+static inline int sched_needs_cpu(int cpu) { return 0; }
+#endif
#else
static inline void calc_load_enter_idle(void) { }
static inline void calc_load_exit_idle(void) { }
--- a/include/linux/sched/sysctl.h
+++ b/include/linux/sched/sysctl.h
@@ -19,6 +19,10 @@ extern unsigned int sysctl_sched_min_gra
extern unsigned int sysctl_sched_wakeup_granularity;
extern unsigned int sysctl_sched_child_runs_first;

+#if defined(CONFIG_NO_HZ_COMMON) && defined(CONFIG_SMP)
+extern unsigned int sysctl_sched_nohz_throttle;
+#endif
+
enum sched_tunable_scaling {
SCHED_TUNABLESCALING_NONE,
SCHED_TUNABLESCALING_LOG,
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -577,6 +577,16 @@ static inline bool got_nohz_idle_kick(vo
return false;
}

+unsigned int sysctl_sched_nohz_throttle = 500000UL;
+
+int sched_needs_cpu(int cpu)
+{
+ if (tick_nohz_full_cpu(cpu))
+ return 0;
+
+ return cpu_rq(cpu)->avg_idle < sysctl_sched_nohz_throttle;
+}
+
#else /* CONFIG_NO_HZ_COMMON */

static inline bool got_nohz_idle_kick(void)
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -351,6 +351,15 @@ static struct ctl_table kern_table[] = {
.mode = 0644,
.proc_handler = proc_dointvec,
},
+#ifdef CONFIG_NO_HZ_COMMON
+ {
+ .procname = "sched_nohz_throttle_ns",
+ .data = &sysctl_sched_nohz_throttle,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+#endif
#ifdef CONFIG_SCHEDSTATS
{
.procname = "sched_schedstats",
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -676,7 +676,7 @@ static ktime_t tick_nohz_stop_sched_tick
} while (read_seqretry(&jiffies_lock, seq));
ts->last_jiffies = basejiff;

- if (rcu_needs_cpu(basemono, &next_rcu) ||
+ if (sched_needs_cpu(cpu) || rcu_needs_cpu(basemono, &next_rcu) ||
arch_needs_cpu() || irq_work_needs_cpu()) {
next_tick = basemono + TICK_NSEC;
} else {