Re: [PATCH v2] perf sched timehist: Add pre-migration wait time option

From: Madadi Vineeth Reddy
Date: Wed Oct 02 2024 - 10:52:50 EST


Hi Namhyung,

On 02/10/24 06:12, Namhyung Kim wrote:
> On Tue, Oct 01, 2024 at 04:36:20PM +0530, Madadi Vineeth Reddy wrote:
>> pre-migration wait time is the time that a task unnecessarily spends
>> on the runqueue of a CPU but doesn't get switched-in there. In terms
>> of tracepoints, it is the time between sched:sched_wakeup and
>> sched:sched_migrate_task.

[snip]

>> static bool is_idle_sample(struct perf_sample *sample,
>> @@ -2598,13 +2611,19 @@ static int timehist_sched_wakeup_event(const struct perf_tool *tool,
>> if (tr == NULL)
>> return -1;
>>
>> - if (tr->ready_to_run == 0)
>> - tr->ready_to_run = sample->time;
>> + if (!strcmp(evsel__name(evsel), "sched:sched_waking")) {
>
> I guess it won't work when there's no sched_waking event. Can you
> simply handle pre-migration in sched_waking?
>

I believe it should still work even without the sched_waking event because
I've updated the condition to ensure that timehist_sched_wakeup_ignore is
not selected when the pre-migration option is enabled.

> Thanks,
> Namhyung
>
>
>> + if (tr->ready_to_run == 0)
>> + tr->ready_to_run = sample->time;

[snip]

>> /* prefer sched_waking if it is captured */
>> - if (evlist__find_tracepoint_by_name(session->evlist, "sched:sched_waking"))
>> + if (!sched->pre_migrations &&
>> + evlist__find_tracepoint_by_name(session->evlist, "sched:sched_waking"))
>> handlers[1].handler = timehist_sched_wakeup_ignore;

In this case, it checks if pre-migration is enabled. If so, the handler will still
use timehist_sched_wakeup_event for the sched:sched_wakeup event.

The reason I initially chose the sched:sched_wakeup tracepoint instead of
sched:sched_waking is that there could be instances where the CPU chosen
during sched_waking may not match the actual CPU where the task ends up running,
as shown in the example below:

wdavdaemon 14789 [006] 31357.614692: sched:sched_waking: comm=wdavdaemon pid=14778 prio=120 target_cpu=005
[snip]
swapper 0 [002] 31357.614695: sched:sched_wakeup: wdavdaemon:14778 [120] CPU:002

However, since we are already accounting for the sched_migrate_task event occurring
between sched_waking and sched_switch and don't need to check target_cpu, switching
to sched_waking should work just as well, with only a very minor time difference.

I'll go ahead and send a v3 with sched_waking. Thanks again for the feedback.

Thanks,
Madadi Vineeth Reddy

>>
>> /* setup per-evsel handlers */
>> @@ -3280,8 +3309,14 @@ static int perf_sched__timehist(struct perf_sched *sched)
>> goto out;
>> }
>>
>> - if (sched->show_migrations &&
>> - perf_session__set_tracepoints_handlers(session, migrate_handlers))
>> + if (sched->pre_migrations && !evlist__find_tracepoint_by_name(session->evlist,
>> + "sched:sched_wakeup")) {
>> + pr_err("No sched_wakeup events found. sched_wakeup tracepoint is mandatory for -P option\n");
>> + goto out;
>> + }
>> +
>> + if ((sched->show_migrations || sched->pre_migrations) &&
>> + perf_session__set_tracepoints_handlers(session, migrate_handlers))
>> goto out;
>>
>> /* pre-allocate struct for per-CPU idle stats */
>> @@ -3823,6 +3858,7 @@ int cmd_sched(int argc, const char **argv)
>> OPT_BOOLEAN(0, "show-prio", &sched.show_prio, "Show task priority"),
>> OPT_STRING(0, "prio", &sched.prio_str, "prio",
>> "analyze events only for given task priority(ies)"),
>> + OPT_BOOLEAN('P', "pre-migrations", &sched.pre_migrations, "Show pre-migration wait time"),
>> OPT_PARENT(sched_options)
>> };
>>
>> --
>> 2.43.2
>>