[PATCH v3 0/7] CFS Bandwidth Control

From: Bharata B Rao
Date: Tue Oct 12 2010 - 03:49:31 EST


Hi,

Its been a while since we posted CPS hard limits (aka CFS bandwidth control
now) patches, hence a quick recap first:

- I have been working on CFS hard limits since last year and have posted
a few versions of the same (last post: http://lkml.org/lkml/2010/1/5/44)
- Paul Turner and Nikhil Rao meanwhile started working on CFS bandwidth
control and have posted a couple of versions.
(last post v2: http://lwn.net/Articles/385055/)

Paul's approach mainly changed the way the CPU hard limit was represented. After
his post, I decided to work with them and discontinue my patch series since
their global bandwidth specification for group appears more flexible than
the RT-type per-cpu bandwidth specification I had in my series.

Since Paul seems to be busy, I am taking the freedom of posting the next
version of his patches with a few enhancements to the slack time handling.
(more on this later)

Main changes in this post:

- Return the unused and remaining local quota at each CPU to the global
runtime pool.
- A few fixes:
- Explicitly wake up the idle cpu during unthrottle.
- Optimally handle throttling of current task within enqueue_task.
- Fix compilation break with CONFIG_PREEMPT on.
- Fix a compilation break at intermediate patch level.
- Applies on 2.6.36-rc7.

More about slack time issue
---------------------------
Bandwidth available to a group is specified by two parameters: cpu.cfs_quota_us
and cpu.cfs_period_us. cpu.cfs_quota_us is the max CPU time a group can
consume within the time period of cpu.cfs_period_us. The quota available
to a group within a period is maintained in a per-group global pool. In each
CPU, the consumption happens by obtaining a slice of this global pool.

If the local quota (obtained as slices of global pool) isn't fully consumed
within a given period, a group can potentially get more CPU time than
its allowed for in the next interval. This is due to the slack time that may
be left over from the previous interval. More details about how this is fixed
is present in the description part of patch 7/7. Here I will only show the
benefit of handling the slack time correctly through this experiment:

On a 16 CPU system, two different kinds of tasks were run as part of a group
which had quota/bandwidth as 500000/500000 [=> 500ms/500ms], which means that
the group was capped at 1CPU worth of time every period.

Type A task: Complete CPU hog.
Type B task: Sleeps for 500ms and runs as CPU hog for next 500ms. And this cycle
repeats.

1 task of type A and 15 tasks of type B were run for 20s, each bound to a
different CPU. At the end of 20s, the CPU time obtained by each of them
looked like this:

-----------------------------------------------------------------------
Without returning Returning slack time
slack time to global to global pool
pool (with patch 7/7)
-----------------------------------------------------------------------
1 type A task 7.96s 10.71s
15 type B tasks 12.36s 9.79s
-----------------------------------------------------------------------

This shows the effects of slack time and the benefit of handling it correctly.

I request the scheduler maintainers and others for comments on these patches.

Regards,
Bharata.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/