Re: [PATCH] sched/cpufreq: don't trigger cpufreq update w/o real rt/deadline tasks running

From: Wanpeng Li
Date: Thu Apr 21 2016 - 08:13:06 EST


2016-04-21 19:11 GMT+08:00 Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>:
> On 4/21/2016 3:09 AM, Wanpeng Li wrote:
>>
>> 2016-04-21 6:28 GMT+08:00 Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>:
>>>
>>> On 4/21/2016 12:24 AM, Wanpeng Li wrote:
>>>>
>>>> 2016-04-20 22:01 GMT+08:00 Peter Zijlstra <peterz@xxxxxxxxxxxxx>:
>>>>>
>>>>> On Wed, Apr 20, 2016 at 02:32:35AM +0200, Rafael J. Wysocki wrote:
>>>>>>
>>>>>> On Monday, April 18, 2016 01:51:24 PM Wanpeng Li wrote:
>>>>>>>
>>>>>>> Sometimes update_curr() is called w/o tasks actually running, it is
>>>>>>> captured by:
>>>>>>> u64 delta_exec = rq_clock_task(rq) - curr->se.exec_start;
>>>>>>> We should not trigger cpufreq update in this case for rt/deadline
>>>>>>> classes, and this patch fix it.
>>>>>>>
>>>>>>> Signed-off-by: Wanpeng Li <wanpeng.li@xxxxxxxxxxx>
>>>>>>
>>>>>> The signed-off-by tag should agree with the From: header. One way to
>>>>>> achieve
>>>>>> that is to add an extra From: line at the start of the changelog.
>>>>>>
>>>>>> That said, this looks like a good catch that should go into 4.6 to me.
>>>>>>
>>>>>> Peter, what do you think?
>>>>>
>>>>> I'm confused by the Changelog. *what* ?
>>>>
>>>> Sometimes .update_curr hook is called w/o tasks actually running, it is
>>>> captured by:
>>>>
>>>> u64 delta_exec = rq_clock_task(rq) - curr->se.exec_start;
>>>>
>>>> We should not trigger cpufreq update in this case for rt/deadline
>>>> classes, and this patch fix it.
>>>
>>>
>>> That's what you wrote in the changelog, no need to repeat that.
>>>
>>> I guess Peter is asking for more details, though. I actually would like
>>> to
>>> get some more details here too. Like an example of when the situation in
>>> question actually happens.
>>
>> I add a print to print when delta_exec is zero for rt class, something
>> like below:
>>
>> watchdog/5-48 [005] d... 568.449095: update_curr_rt: rt
>> delta_exec is zero
>> watchdog/5-48 [005] d... 568.449104: <stack trace>
>> => pick_next_task_rt
>> => __schedule
>> => schedule
>> => smpboot_thread_fn
>> => kthread
>> => ret_from_fork
>> watchdog/5-48 [005] d... 568.449105: update_curr_rt: rt
>> delta_exec is zero
>> watchdog/5-48 [005] d... 568.449111: <stack trace>
>> => put_prev_task_rt
>> => pick_next_task_idle
>> => __schedule
>> => schedule
>> => smpboot_thread_fn
>> => kthread
>> => ret_from_fork
>> watchdog/6-56 [006] d... 568.510094: update_curr_rt: rt
>> delta_exec is zero
>> watchdog/6-56 [006] d... 568.510103: <stack trace>
>> => pick_next_task_rt
>> => __schedule
>> => schedule
>> => smpboot_thread_fn
>> => kthread
>> => ret_from_fork
>> watchdog/6-56 [006] d... 568.510105: update_curr_rt: rt
>> delta_exec is zero
>> watchdog/6-56 [006] d... 568.510111: <stack trace>
>> => put_prev_task_rt
>> => pick_next_task_idle
>> => __schedule
>> => schedule
>> => smpboot_thread_fn
>> => kthread
>> => ret_from_fork
>> [...]
>
>
> And the statement in your changelog follows from this I suppose. How does it
> follow, exactly?

For example, rt task A will go to sleep, an rt task B is the next
candidate to run.

__schedule()
-> deactivate_task(A, DEQUEUE_SLEEP)
-> dequeue_task_rt()
-> update_curr_rt()
-> cpufreq_trigger_update()
-> delta_exec = rq_clock_task(rq) - curr->se.exec_start;
[...]
-> pick_next_task_rt()
-> update_curr_rt() => rq->curr is still A currently
-> cpufreq_trigger_update()
-> delta_exec = rq_clock_task(rq) - curr->se.exec_start;
=> delta == 0, actually A is not running between these two updates
if (likely(prev != next)) {
rq->curr = B;
[...]
}

Regards,
Wanpeng Li