Re: question on sched-rt group allocation cap: sched_rt_runtime_us

From: Anirban Sinha
Date: Tue Sep 08 2009 - 13:47:46 EST



On 2009-09-08, at 10:32 AM, Anirban Sinha wrote:




-----Original Message-----
From: Mike Galbraith [mailto:efault@xxxxxx]
Sent: Sat 9/5/2009 11:32 PM
To: Anirban Sinha
Cc: Lucas De Marchi; linux-kernel@xxxxxxxxxxxxxxx; Peter Zijlstra; Ingo Molnar
Subject: Re: question on sched-rt group allocation cap: sched_rt_runtime_us

On Sat, 2009-09-05 at 19:32 -0700, Ani wrote:
> On Sep 5, 3:50 pm, Lucas De Marchi <lucas.de.mar...@xxxxxxxxx> wrote:
> >
> > Indeed. I've tested this same test program in a single core machine and it
> > produces the expected behavior:
> >
> > rt_runtime_us / rt_period_us % loops executed in SCHED_OTHER
> > 95% 4.48%
> > 60% 54.84%
> > 50% 86.03%
> > 40% OTHER completed first
> >
>
> Hmm. This does seem to indicate that there is some kind of
> relationship with SMP. So I wonder whether there is a way to turn this
> 'RT bandwidth accumulation' heuristic off.

No there isn't, but maybe there should be, since this isn't the first
time it's come up. One pro argument is that pinned tasks are thoroughly
screwed when an RT hog lands on their runqueue. On the con side, the
whole RT bandwidth restriction thing is intended (AFAIK) to allow an
admin to regain control should RT app go insane, which the default 5%
aggregate accomplishes just fine.

Dunno. Fly or die little patchlet (toss).

So it would be nice to have a knob like this when CGROUPS is disabled (it say 'say N when unsure' :)). CPUSETS depends on CGROUPS.



sched: allow the user to disable RT bandwidth aggregation.

Signed-off-by: Mike Galbraith <efault@xxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxx>
Cc: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>


Verified-by: Anirban Sinha <asinha@xxxxxxxxxxxxxxxxx>


LKML-Reference: <new-submission>

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8736ba1..6e6d4c7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1881,6 +1881,7 @@ static inline unsigned int get_sysctl_timer_migration(void)
#endif
extern unsigned int sysctl_sched_rt_period;
extern int sysctl_sched_rt_runtime;
+extern int sysctl_sched_rt_bandwidth_aggregate;

int sched_rt_handler(struct ctl_table *table, int write,
struct file *filp, void __user *buffer, size_t *lenp,
diff --git a/kernel/sched.c b/kernel/sched.c
index c512a02..ca6a378 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -864,6 +864,12 @@ static __read_mostly int scheduler_running;
*/
int sysctl_sched_rt_runtime = 950000;

+/*
+ * aggregate bandwidth, ie allow borrowing from neighbors when
+ * bandwidth for an individual runqueue is exhausted.
+ */
+int sysctl_sched_rt_bandwidth_aggregate = 1;
+
static inline u64 global_rt_period(void)
{
return (u64)sysctl_sched_rt_period * NSEC_PER_USEC;
diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index 2eb4bd6..75daf88 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -495,6 +495,9 @@ static int balance_runtime(struct rt_rq *rt_rq)
{
int more = 0;

+ if (!sysctl_sched_rt_bandwidth_aggregate)
+ return 0;
+
if (rt_rq->rt_time > rt_rq->rt_runtime) {
spin_unlock(&rt_rq->rt_runtime_lock);
more = do_balance_runtime(rt_rq);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index cdbe8d0..0ad08e5 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -368,6 +368,14 @@ static struct ctl_table kern_table[] = {
},
{
.ctl_name = CTL_UNNUMBERED,
+ .procname = "sched_rt_bandwidth_aggregate",
+ .data = &sysctl_sched_rt_bandwidth_aggregate,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &sched_rt_handler,
+ },
+ {
+ .ctl_name = CTL_UNNUMBERED,
.procname = "sched_compat_yield",
.data = &sysctl_sched_compat_yield,
.maxlen = sizeof(unsigned int),





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/