Re: [PATCH V2] idle/intel_powerclamp: Redesign idle injection to use bandwidth control mechanism
From: Preeti U Murthy
Date: Wed Feb 11 2015 - 04:00:45 EST
On 02/09/2015 11:44 PM, Steven Noonan wrote:
> On Mon, Feb 9, 2015 at 9:56 AM, Steven Noonan <steven@xxxxxxxxxxxxxx> wrote:
>> On Mon, Feb 9, 2015 at 3:51 AM, Preeti U Murthy
>> <preeti@xxxxxxxxxxxxxxxxxx> wrote:
>>> Hi Steven,
>>> On 02/09/2015 01:02 PM, Steven Noonan wrote:
>>>> On Sun, Feb 8, 2015 at 8:49 PM, Preeti U Murthy
>>>> <preeti@xxxxxxxxxxxxxxxxxx> wrote:
>>>>> The powerclamp driver injects idle periods to stay within the thermal constraints.
>>>>> The driver does a fake idle by spawning per-cpu threads that call the mwait
>>>>> instruction. This behavior of fake idle can confuse the other kernel subsystems.
>>>>> For instance it calls into the nohz tick handlers, which are meant to be called
>>>>> only by the idle thread. It sets the state of the tick in each cpu to idle and
>>>>> stops the tick, when there are tasks on the runqueue. As a result the callers of
>>>>> idle_cpu()/ tick_nohz_tick_stopped() see different states of the cpu; while the
>>>>> former thinks that the cpu is busy, the latter thinks that it is idle. The outcome
>>>>> may be inconsistency in the scheduler/nohz states which can lead to serious
>>>>> consequences. One of them was reported on this thread:
>>>>> Thomas posted out a patch to disable the powerclamp driver from calling into the
>>>>> tick nohz code which has taken care of the above regression for the moment. However
>>>>> powerclamp driver as a result, will not be able to inject idle periods due to the
>>>>> presence of periodic ticks. With the current design of fake idle, we cannot move
>>>>> towards a better solution.
>>>>> This patch aims at removing the concept of fake idle and instead makes the cpus
>>>>> truly idle by throttling the runqueues during the idle injection periods. The situation
>>>>> is in fact very similar to throttling of cfs_rqs when they exceed their bandwidths.
>>>>> The idle injection metrics can be mapped to the bandwidth control metrics 'quota' and
>>>>> 'period' to achieve the same result. When the powerclamping is begun or when the
>>>>> clamping controls have been modified, the bandwidth for the root task group is set.
>>>>> The 'quota' will be the amount of time that the system needs to be busy and 'period'
>>>>> will be the sum of this busy duration and the idle duration as calculated by the driver.
>>>>> This gets rid of per-cpu kthreads, control cpu, hotplug notifiers and clamping mask since
>>>>> the thread starting powerclamping will set the bandwidth and throttling of all cpus will
>>>>> automatically fall in place. None of the other cpus need be bothered about this. This
>>>>> simplifies the design of the driver.
>>>>> Of course this is only if the idle injection metrics can be conveniently transformed
>>>>> into bandwidth control metrics. There are a couple of other primary concerns around if
>>>>> doing the below two in this patch is valid.
>>>>> a. This patch exports the functions to set the quota and period of task groups.
>>>>> b. This patch removes the constraint of not being able to set the root task grp's bandwidth.
>>>>> Signed-off-by: Preeti U Murthy <preeti@xxxxxxxxxxxxxxxxxx>
>>>> This doesn't compile.
>>> Thanks for reporting this! I realized that I had not compiled in the powerclamp driver
>>> as a module while compile testing it. I was focusing on the issues with the design and
>>> failed to cross verify this. Apologies for the inconvenience.
>>> Find the diff compile tested below.
>>> I also realized that clamp_cpus() that sets the bandwidth cannot be called from
>>> multiple places. Currently I am calling it from end_powerclamp(), when the user changes the
>>> idle clamping duration and from a queued timer. This will require synchronization between
>>> callers which is not really called for. The queued wakeup_timer alone can re-evaluate the
>>> clamping metrics after every throttle-unthrottle period and this should suffice as far
>>> as I can see. Thoughts ?
>> Hmm, I've had two system lockups so far while running a kernel with
>> intel_powerclamp loaded. Both times it slowly ground to a halt and
>> processes piled up...
Hmm I see. I am not surprised because this patch is not complete yet.
The idea was to gain suggestions and review around the approach first
before I went ahead to making it robust. Let me ease the creases that I
found and repost this patch for you to test. Thanks a lot for the
testing efforts! :)
Preeti U Murthy
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/