[BUG] perf_events: ctx_flexible_sched_in()

From: Stephane Eranian
Date: Mon Feb 01 2010 - 07:20:42 EST


Hi,

I believe there is something wrong with ctx_flexible_sched_in().

The function does not allow maximizing PMU usage because of
the way can_add_hw is managed. Basically, as soon as a group
fail to be scheduled in, then no other group can. I believe this
is not optimum. You need to skip the group that fails and keep
scanning the list. There may be other groups which can be
scheduled.

Here is an example to illustrate the issue:
$ task -ebaclears,div,instructions_retired,fp_assist noploop 5
noploop for 5 seconds
908 baclears (scaled from 74.97% of time)
0 div (scaled from 50.01% of time)
11328128990 instructions_retired (scaled from 74.99% of time)
0 fp_assist (scaled from 50.00% of time)

Here div, fp_assist can only go on counter 1. There is no explicit
grouping. On Intel Core, you have 2 generic, 3 fixed counters.
Instruction_retired can go on a fixed counter. Thus, I was
expecting baclears and instructions_retired to always be scheduled.
The other two would alternate at 50% each. While you get the latter
behavior, you are not getting full utilization for the other two.

Once I modify ctx_flexible_sched_in():
$ ./task -ebaclears,div,instructions_retired,fp_assist noploop 5
noploop for 5 seconds
658 baclears
0 div (scaled from 50.01% of time)
11726844342 instructions_retired
0 fp_assist (scaled from 50.00% of time)

I get the right result. Thus, I think, we need to drop can_add_hw
from ctx_flexible_sched_in().

Am I missing something in the role of can_add_hw?

If not, then I I will provide a patch to get the optimum behavior.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/