Re: [PATCH] sched/cpufreq: don't trigger cpufreq update w/o real rt/deadline tasks running

From: Wanpeng Li
Date: Thu Apr 21 2016 - 08:25:22 EST


2016-04-21 20:12 GMT+08:00 Wanpeng Li <kernellwp@xxxxxxxxx>:
> 2016-04-21 19:11 GMT+08:00 Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>:
>> On 4/21/2016 3:09 AM, Wanpeng Li wrote:
>>>
>>> 2016-04-21 6:28 GMT+08:00 Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>:
>>>>
>>>> On 4/21/2016 12:24 AM, Wanpeng Li wrote:
>>>>>
>>>>> 2016-04-20 22:01 GMT+08:00 Peter Zijlstra <peterz@xxxxxxxxxxxxx>:
>>>>>>
>>>>>> On Wed, Apr 20, 2016 at 02:32:35AM +0200, Rafael J. Wysocki wrote:
>>>>>>>
>>>>>>> On Monday, April 18, 2016 01:51:24 PM Wanpeng Li wrote:
>>>>>>>>
>>>>>>>> Sometimes update_curr() is called w/o tasks actually running, it is
>>>>>>>> captured by:
>>>>>>>> u64 delta_exec = rq_clock_task(rq) - curr->se.exec_start;
>>>>>>>> We should not trigger cpufreq update in this case for rt/deadline
>>>>>>>> classes, and this patch fix it.
>>>>>>>>
>>>>>>>> Signed-off-by: Wanpeng Li <wanpeng.li@xxxxxxxxxxx>
>>>>>>>
>>>>>>> The signed-off-by tag should agree with the From: header. One way to
>>>>>>> achieve
>>>>>>> that is to add an extra From: line at the start of the changelog.
>>>>>>>
>>>>>>> That said, this looks like a good catch that should go into 4.6 to me.
>>>>>>>
>>>>>>> Peter, what do you think?
>>>>>>
>>>>>> I'm confused by the Changelog. *what* ?
>>>>>
>>>>> Sometimes .update_curr hook is called w/o tasks actually running, it is
>>>>> captured by:
>>>>>
>>>>> u64 delta_exec = rq_clock_task(rq) - curr->se.exec_start;
>>>>>
>>>>> We should not trigger cpufreq update in this case for rt/deadline
>>>>> classes, and this patch fix it.
>>>>
>>>>
>>>> That's what you wrote in the changelog, no need to repeat that.
>>>>
>>>> I guess Peter is asking for more details, though. I actually would like
>>>> to
>>>> get some more details here too. Like an example of when the situation in
>>>> question actually happens.
>>>
>>> I add a print to print when delta_exec is zero for rt class, something
>>> like below:
>>>
>>> watchdog/5-48 [005] d... 568.449095: update_curr_rt: rt
>>> delta_exec is zero
>>> watchdog/5-48 [005] d... 568.449104: <stack trace>
>>> => pick_next_task_rt
>>> => __schedule
>>> => schedule
>>> => smpboot_thread_fn
>>> => kthread
>>> => ret_from_fork
>>> watchdog/5-48 [005] d... 568.449105: update_curr_rt: rt
>>> delta_exec is zero
>>> watchdog/5-48 [005] d... 568.449111: <stack trace>
>>> => put_prev_task_rt
>>> => pick_next_task_idle
>>> => __schedule
>>> => schedule
>>> => smpboot_thread_fn
>>> => kthread
>>> => ret_from_fork
>>> watchdog/6-56 [006] d... 568.510094: update_curr_rt: rt
>>> delta_exec is zero
>>> watchdog/6-56 [006] d... 568.510103: <stack trace>
>>> => pick_next_task_rt
>>> => __schedule
>>> => schedule
>>> => smpboot_thread_fn
>>> => kthread
>>> => ret_from_fork
>>> watchdog/6-56 [006] d... 568.510105: update_curr_rt: rt
>>> delta_exec is zero
>>> watchdog/6-56 [006] d... 568.510111: <stack trace>
>>> => put_prev_task_rt
>>> => pick_next_task_idle
>>> => __schedule
>>> => schedule
>>> => smpboot_thread_fn
>>> => kthread
>>> => ret_from_fork
>>> [...]
>>
>>
>> And the statement in your changelog follows from this I suppose. How does it
>> follow, exactly?
>
> For example, rt task A will go to sleep, an rt task B is the next
> candidate to run.
>
> __schedule()
> -> deactivate_task(A, DEQUEUE_SLEEP)
> -> dequeue_task_rt()
> -> update_curr_rt()
> -> cpufreq_trigger_update()
> -> delta_exec = rq_clock_task(rq) - curr->se.exec_start;
> [...]
> -> pick_next_task_rt()
> -> update_curr_rt() => rq->curr is still A currently
> -> cpufreq_trigger_update()
> -> delta_exec = rq_clock_task(rq) - curr->se.exec_start;
> => delta == 0, actually A is not running between these two updates
> if (likely(prev != next)) {
> rq->curr = B;
> [...]
> }

Actually I suspect that there is another cpufreq update w/ delta == 0
due to pick_next_task_rt() currently implementation:

if (prev->sched_class == &rt_sched_class)
update_curr(rq); => rq->curr is still A currently
[...]
put_prev_task(rq, prev);
-> update_curr(rq); => rq->curr is still A currently

Regards,
Wanpeng Li