Re: [RFCv6 PATCH 03/10] sched: scheduler-driven cpu frequency selection

From: Steve Muckle
Date: Mon Dec 14 2015 - 21:02:34 EST


Hi Juri,

Thanks for the review.

On 12/11/2015 03:04 AM, Juri Lelli wrote:
>> +config CPU_FREQ_GOV_SCHED
>> + bool "'sched' cpufreq governor"
>> + depends on CPU_FREQ
>
> We depend on IRQ_WORK as well, which in turn I think depends on SMP. As
> briefly discussed with Peter on IRC, we might want to use
> smp_call_function_single_async() instead to break this dependecies
> chain (and be able to use this governor on UP as well).

FWIW I don't see an explicit dependency of IRQ_WORK on SMP
(init/Kconfig), nevertheless I'll take a look at moving to
smp_call_function_single_async() to reduce the dependency list of
sched-freq.

...
>> + /* avoid race with cpufreq_sched_stop */
>> + if (!down_write_trylock(&policy->rwsem))
>> + return;
>> +
>> + __cpufreq_driver_target(policy, freq, CPUFREQ_RELATION_L);
>> +
>> + gd->throttle = ktime_add_ns(ktime_get(), gd->throttle_nsec);
>
> As I think you proposed at Connect, we could use post frequency
> transition notifiers to implement throttling. Is this something that you
> already tried implementing/planning to experiment with?

I started to do this a while back and then decided to hold off. I think
(though I can't recall for sure) it may have been so I could
artificially throttle the rate of frequency change events further by
specifying an inflated frequency change time. That's useful to have as
we experiment with policy.

We probably want both of these mechanisms. Throttling at a minimum based
on transition end notifiers, and the option of throttling further for
policy purposes (at least for now, or as a debug option). Will look at
this again.

...
>> +static int cpufreq_sched_thread(void *data)
>> +{
>> + struct sched_param param;
>> + struct cpufreq_policy *policy;
>> + struct gov_data *gd;
>> + unsigned int new_request = 0;
>> + unsigned int last_request = 0;
>> + int ret;
>> +
>> + policy = (struct cpufreq_policy *) data;
>> + gd = policy->governor_data;
>> +
>> + param.sched_priority = 50;
>> + ret = sched_setscheduler_nocheck(gd->task, SCHED_FIFO, &param);
>> + if (ret) {
>> + pr_warn("%s: failed to set SCHED_FIFO\n", __func__);
>> + do_exit(-EINVAL);
>> + } else {
>> + pr_debug("%s: kthread (%d) set to SCHED_FIFO\n",
>> + __func__, gd->task->pid);
>> + }
>> +
>> + do {
>> + set_current_state(TASK_INTERRUPTIBLE);
>> + new_request = gd->requested_freq;
>> + if (new_request == last_request) {
>> + schedule();
>> + } else {
>
> Shouldn't we have to do the following here?
>
>
> @@ -125,9 +125,9 @@ static int cpufreq_sched_thread(void *data)
> }
>
> do {
> - set_current_state(TASK_INTERRUPTIBLE);
> new_request = gd->requested_freq;
> if (new_request == last_request) {
> + set_current_state(TASK_INTERRUPTIBLE);
> schedule();
> } else {
> /*
>
> Otherwise we set task to INTERRUPTIBLE state right after it has been
> woken up.

The state must be set to TASK_INTERRUPTIBLE before the data used to
decide whether to sleep or not is read (gd->requested_freq in this case).

If it is set after, then once gd->requested_freq is read but before the
state is set to TASK_INTERRUPTIBLE, the other side may update
gd->requested_freq and issue a wakeup on the freq thread. The wakeup
will have no effect since the freq thread would still be TASK_RUNNING at
that time. The freq thread would proceed to go to sleep and the update
would be lost.

thanks,
Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/