Re: BFS vs. mainline scheduler benchmarks and measurements

From: Frederic Weisbecker
Date: Thu Sep 10 2009 - 16:25:55 EST


On Tue, Sep 08, 2009 at 09:15:22PM +0300, Nikos Chantziaras wrote:
> On 09/07/2009 02:01 PM, Frederic Weisbecker wrote:
>> That looks eventually benchmarkable. This is about latency.
>> For example, you could try to run high load tasks in the
>> background and then launch a task that wakes up in middle/large
>> periods to do something. You could measure the time it takes to wake
>> it up to perform what it wants.
>>
>> We have some events tracing infrastructure in the kernel that can
>> snapshot the wake up and sched switch events.
>>
>> Having CONFIG_EVENT_TRACING=y should be sufficient for that.
>>
>> You just need to mount a debugfs point, say in /debug.
>>
>> Then you can activate these sched events by doing:
>>
>> echo 0> /debug/tracing/tracing_on
>> echo 1> /debug/tracing/events/sched/sched_switch/enable
>> echo 1> /debug/tracing/events/sched/sched_wake_up/enable
>>
>> #Launch your tasks
>>
>> echo 1> /debug/tracing/tracing_on
>>
>> #Wait for some time
>>
>> echo 0> /debug/tracing/tracing_off
>>
>> That will require some parsing of the result in /debug/tracing/trace
>> to get the delays between wake_up events and switch in events
>> for the task that periodically wakes up and then produce some
>> statistics such as the average or the maximum latency.
>>
>> That's a bit of a rough approach to measure such latencies but that
>> should work.
>
> I've tried this with 2.6.31-rc9 while running mplayer and alt+tabbing
> repeatedly to the point where mplayer starts to stall and drop frames.
> This produced a 4.1MB trace file (132k bzip2'ed):
>
> http://foss.math.aegean.gr/~realnc/kernel/trace1.bz2
>
> Uncompressed for online viewing:
>
> http://foss.math.aegean.gr/~realnc/kernel/trace1
>
> I must admit that I don't know what it is I'm looking at :P


Hehe :-)

Basically you have samples of two kind of events:

- wake up (when thread A wakes up B)

The format is as follows:


task-pid
(the waker A)
|
| cpu timestamp event-name wakee(B) prio status
| | | | | | |
X-11482 [001] 1023.219246: sched_wakeup: task kwin:11571 [120] success=1

Here X is awakening kwin.


- sched switch (when the scheduler stops A and launches B)

A, task B, task
that gets that gets
sched sched
out in
A cpu timestamp event-name | A prio | B prio
| | | | | | | |
X-11482 [001] 1023.219247: sched_switch: task X:11482 [120] (R) ==> kwin:11571 [120]
|
|
State of A
For A state we can have either:

R: TASK_RUNNING, the task is not sleeping but it is rescheduled for later
to let another task run

S: TASK_INTERRUPTIBLE, the task is sleeping, waiting for an event that may
wake it up. The task can be waked by a signal

D: TASK_UNINTERRUPTIBLE, same as above but can't be waked by a signal.


Now what could be interesting interesting is to measure the time between
such pair of events:

- t0: A wakes up B
- t1: B is sched in

t1 - t0 would then be the scheduler latency, or at least part of it:

The scheduler latency may be an addition of several factors:

- the time it takes for the actual wake up to perform (re-insert
the task into a runqueue, which can be subject to the runqueue(s)
design, the rebalancing if needed, etc..

- the time between a task is waked up and the scheduler eventually
decide to schedule it in.

- the time it takes to perform the task switch, which is not only
in the scheduler scope. But the time it takes may depend of a
rebalancing decision (cache cold, etc..)

Unfortunately we can only measure the second part with the above ftrace
events. But that's still an interesting scheduler abstract that is a
large part of the scheduler latency.

We could write a tiny parser that could walk through such ftrace traces
and produce some average, maximum, standard deviation numbers.

But we have userspace tools that can parse ftrace events (through perf
counter), so I'm trying to write something there, hopefully I could get
a relevant end result.

Thanks.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/