Can we avoid sched:sched_migrate_task event being called twicefrom sched_fork()

From: Srikar Dronamraju
Date: Fri Mar 08 2013 - 08:03:51 EST


When using sched:sched_migrate_tasks, its little awkward to see records that
suggest current task migrating from the current cpu to the same current cpu.

perf record -e sched:sched_migrate_task -af bash -c "ls;ps"; perf script

perf 5292 [035] 347249.373741: sched:sched_migrate_task: comm=perf pid=5293 prio=120 orig_cpu=43 dest_cpu=34
migration/34 144 [034] 347249.373938: sched:sched_migrate_task: comm=perf pid=5293 prio=120 orig_cpu=34 dest_cpu=43
swapper 0 [035] 347249.375030: sched:sched_migrate_task: comm=rcu_sched pid=10 prio=120 orig_cpu=11 dest_cpu=35
bash 5293 [043] 347249.377255: sched:sched_migrate_task: comm=bash pid=5293 prio=120 orig_cpu=43 dest_cpu=43 <---- 1
bash 5293 [043] 347249.377580: sched:sched_migrate_task: comm=bash pid=5294 prio=120 orig_cpu=43 dest_cpu=51 <---- 2
migration/51 231 [051] 347249.377888: sched:sched_migrate_task: comm=bash pid=5294 prio=120 orig_cpu=51 dest_cpu=59
swapper 0 [043] 347249.380039: sched:sched_migrate_task: comm=rcu_sched pid=10 prio=120 orig_cpu=35 dest_cpu=43
:5294 5294 [059] 347249.381911: sched:sched_migrate_task: comm=bash pid=5293 prio=120 orig_cpu=43 dest_cpu=58
bash 5293 [058] 347249.382449: sched:sched_migrate_task: comm=bash pid=5293 prio=120 orig_cpu=58 dest_cpu=58 <---- 3
bash 5293 [058] 347249.382995: sched:sched_migrate_task: comm=bash pid=5295 prio=120 orig_cpu=58 dest_cpu=3 <---- 4
migration/3 20 [003] 347249.383351: sched:sched_migrate_task: comm=bash pid=5295 prio=120 orig_cpu=3 dest_cpu=11
kworker/11:1 175 [011] 347249.419342: sched:sched_migrate_task: comm=tmux pid=10349 prio=120 orig_cpu=11 dest_cpu=10
:5295 5295 [011] 347249.427053: sched:sched_migrate_task: comm=bash pid=5293 prio=120 orig_cpu=58 dest_cpu=10

If we look at reference 1 & 3 above, the orig_cpu and dest_cpu are the same.
Also from pid column and the recorded pid, we get an impression that the
current task is asking itself to be migrated from the current cpu on to the
current cpu.

On fork/clone path, we set the cpu twice. The first time we do it from
sched_fork() (references 1 & 3) and then later in wake_up_new_task()
(references 2 & 4).

However perf sw event migrations get accounted correctly.

perf stat numbers also indicate the same. (i.e lesser number of migrations
than the actual sched:sched_migrate_task)

perf stat -a -e sched:sched_migrate_task -e migrations bash -c "ls;ps"

Performance counter stats for 'bash -c ls;ps':

12 sched:sched_migrate_task [100.00%]
10 migrations

0.052509285 seconds time elapsed

At the time of sched_fork(), even pid field of child is also not updated.
Further we are trying to set it to the parent's current cpu which should be
the same as the set in child, (copied from parent in dup_task_struct)

The child task wont run till wake_up_new_task() is called, and the
actual cpu setting happens at wake_up_new_task(). Can we avoid the trace
being called from the sched_fork() path?

The below patch tries to call __set_task_cpu() instead of set_task_cpu()
in sched_fork() and hence avoids calling the trace.

Here are the results after we avoid trace_sched_migrate_task() from
sched_fork().

perf record -e sched:sched_migrate_task -af bash -c "ls;ps"; perf script

migration/19 104 [019] 406.931532: sched:sched_migrate_task: comm=perf pid=60217 prio=120 orig_cpu=19 dest_cpu=27
bash 60217 [027] 406.939006: sched:sched_migrate_task: comm=bash pid=60218 prio=120 orig_cpu=27 dest_cpu=35
migration/35 195 [035] 406.939515: sched:sched_migrate_task: comm=bash pid=60218 prio=120 orig_cpu=35 dest_cpu=43
swapper 0 [027] 406.942050: sched:sched_migrate_task: comm=rcu_sched pid=10 prio=120 orig_cpu=19 dest_cpu=27
:60218 60218 [043] 406.949920: sched:sched_migrate_task: comm=bash pid=60217 prio=120 orig_cpu=27 dest_cpu=42
bash 60217 [042] 406.951489: sched:sched_migrate_task: comm=bash pid=60219 prio=120 orig_cpu=42 dest_cpu=51
migration/51 289 [051] 406.951950: sched:sched_migrate_task: comm=bash pid=60219 prio=120 orig_cpu=51 dest_cpu=59
swapper 0 [043] 406.988034: sched:sched_migrate_task: comm=rcu_sched pid=10 prio=120 orig_cpu=27 dest_cpu=43
:60219 60219 [059] 407.051751: sched:sched_migrate_task: comm=bash pid=60217 prio=120 orig_cpu=42 dest_cpu=58

perf stat -a -e sched:sched_migrate_task -e migrations bash -c "ls;ps"

Performance counter stats for 'bash -c ls;ps':

17 sched:sched_migrate_task [100.00%]
17 migrations

0.107531428 seconds time elapsed

Other options could be avoid setting the cpu from sched_fork or move the
trace_sched_migrate_task() in set_task_cpu() to where the perf sw migrations
are accounted.

--
Thanks and Regards
Srikar

-----8<--------------------------------------------------------------