Re: cgroup, RT reservation per core(s)?

From: Rolando Martins
Date: Tue Feb 10 2009 - 09:46:53 EST

Next message: Frederic Weisbecker: "Re: [PATCH 1/2] tracing/sysprof: add missingtracing_{start,stop}_record_cmdline()"
Previous message: Frederic Weisbecker: "Re: [PATCH 1/2] tracing/sysprof: add missingtracing_{start,stop}_record_cmdline()"
In reply to: Peter Zijlstra: "Re: cgroup, RT reservation per core(s)?"
Next in thread: Peter Zijlstra: "Re: cgroup, RT reservation per core(s)?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2/10/09, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Mon, 2009-02-09 at 20:04 +0000, Rolando Martins wrote:
>
> > I should have elaborated this more:
> >
> > root
> > ----|----
> > | |
> > (0.5 mem) 0 1 (100% rt, 0.5 mem)
> > ---------
> > | | |
> > 2 3 4 (33% rt for each group, 33% mem
> > per group(0.165))
> > Rol
>
>
>
> Right, i think this can be done.
>
> You would indeed need cpusets and sched-cgroups.
>
> Split the machine in 2 using cpusets.
>
> ___R___
> / \
> A B
>
> Where R is the root cpuset, and A and B are the siblings.
> Assign A one half the cpus, and B the other half.
> Disable load-balancing on R.
>
> Then using sched cgroups create the hierarchy
>
> ____1____
> / | \
> 2 3 4
>
> Where 1 can be the root group if you like.
>
> Assign 1 a utilization limit of 100%, and 2,3 and 4 a utilization limit
> of 33% each.
>
> Then place the tasks that get 100% cputime on your 2 cpus in cpuset A
> and sched group 1.
>
> Place your other tasks in B,{2-4} respectively.
>
> The reason this works is that bandwidth distribution is sched domain
> wide, and by disabling load-balancing on R, you split the schedule
> domain.
>
> I've never actually tried anything like this, let me know if it
> works ;-)
>
On 2/10/09, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Mon, 2009-02-09 at 20:04 +0000, Rolando Martins wrote:
>
> > I should have elaborated this more:
> >
> > root
> > ----|----
> > | |
> > (0.5 mem) 0 1 (100% rt, 0.5 mem)
> > ---------
> > | | |
> > 2 3 4 (33% rt for each group, 33% mem
> > per group(0.165))
> > Rol
>
>
>
> Right, i think this can be done.
>
> You would indeed need cpusets and sched-cgroups.
>
> Split the machine in 2 using cpusets.
>
> ___R___
> / \
> A B
>
> Where R is the root cpuset, and A and B are the siblings.
> Assign A one half the cpus, and B the other half.
> Disable load-balancing on R.
>
> Then using sched cgroups create the hierarchy
>
> ____1____
> / | \
> 2 3 4
>
> Where 1 can be the root group if you like.
>
> Assign 1 a utilization limit of 100%, and 2,3 and 4 a utilization limit
> of 33% each.
>
> Then place the tasks that get 100% cputime on your 2 cpus in cpuset A
> and sched group 1.
>
> Place your other tasks in B,{2-4} respectively.
>
> The reason this works is that bandwidth distribution is sched domain
> wide, and by disabling load-balancing on R, you split the schedule
> domain.
>
> I've never actually tried anything like this, let me know if it
> works ;-)
>

Thanks Peter, it works!
I am thinking for different strategies to be used in my rt middleware
project, and I think there is a limitation.
If I wanted to have some RT on the B cpuset, I couldn't because I
assigned A.cpu.rt_runtime_ns = root.cpu.rt_runtime_ns (then subdivided
the A cpuset, with 2,3,4, each one having A.cpu.rt_runtime_ns/3).

This happens because there is a global /proc/sys/kernel/sched_rt_runtime_us and
/proc/sys/kernel/sched_rt_period_us.
What do you think about adding a separate tuple (runtime,period) for
each core/cpu?

In this case:
/proc/sys/kernel/sched_rt_runtime_us_0
/proc/sys/kernel/sched_rt_period_us_0
...
/proc/sys/kernel/sched_rt_runtime_us_n (n, cpu count)
/proc/sys/kernel/sched_rt_period_us_n

Given this, we could the following:

mkdir /dev/cgroup/A
echo 0-1 > /dev/cgroup/A/cpuset.cpus
echo 0 > /dev/cgroup/A/cpuset.mems
echo 1000000 > /dev/cgroup/A/cpu.rt_period_us
echo 1000000 > /dev/cgroup/A/cpu.rt_runtime_us

This would only work if we could allocate
(cpu.rt_runtime_us,cpu.rt_period_us) in both CPU 0 and CPU 1,
otherwise fail.

mkdir /dev/cgroup/B
echo 2-3 > /dev/cgroup/B/cpuset.cpus
echo 0 > /dev/cgroup/B/cpuset.mems
echo 1000000 > /dev/cgroup/B/cpu.rt_period_us
echo 800000 > /dev/cgroup/B/cpu.rt_runtime_us
The same here, failed if we couldn't allocate 0.8 in both CPU 2 and CPU 3.

Does this make sense? ;)

Rol
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Frederic Weisbecker: "Re: [PATCH 1/2] tracing/sysprof: add missingtracing_{start,stop}_record_cmdline()"
Previous message: Frederic Weisbecker: "Re: [PATCH 1/2] tracing/sysprof: add missingtracing_{start,stop}_record_cmdline()"
In reply to: Peter Zijlstra: "Re: cgroup, RT reservation per core(s)?"
Next in thread: Peter Zijlstra: "Re: cgroup, RT reservation per core(s)?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]