[RFC] Scheduler recorder and playback

From: Pantelis Antoniou
Date: Thu Mar 08 2012 - 08:21:06 EST


Hi there,

There's considerable activity in the subject of the scheduler lately and how to
adapt it to the peculiarities of the new class of hardware coming out lately,
like the big.LITTLE class of devices from a number of manufacturers.

The platforms that Linux runs are very diverse, and run differing workloads.
For example most consumer devices will very likely run something like Android,
with common use cases such as audio and/or video playback. Methods to achieve
lower power consumption using a power aware scheduler are under investigation.

Similarly for server applications, or VM hosting, the behavior of the scheduler
shouldn't have adverse performance implications; any power saving on top of that
would be a welcome improvement.

The current issue is that scheduler development is not easily shared between
developers. Each developer has their own 'itch', be it Android use cases, server
workloads, VM, etc. The risk is high of optimizing for one's own use case and
causing severe degradation on most other use cases.

One way to fix this problem would be the development of a method with which one
could perform a given use-case workload in a host, record the activity in a
interchangeable portable trace format file, and then play it back on another
host via a playback application that will generate an approximately similar load
which was observed during recording.

The way that the two hosts respond under the same load generated by the playback
application can be compared, so that the performance of the two scheduler implementations
measured in various metrics (like performance, power consumption etc.) can be
evaluated.

The fidelity of the this approximation is of great importance but it is debatable
if it is possible to have a fully identical load generated, since details of the hosts
might differ in such a way that such a thing is impossible.
I believe that it should be possible at least to simulate a purely CPU load, and the
blocking behavior of tasks, in such a way that it would result in scheduler decisions
that can be compared and shared among developers.

The recording part I believe can be handled by the kernel's tracing infrastructure,
either by using the existing tracepoints, or need be adding more; possibly even
creating a new tracer solely for this purpose.
Since some applications can adapt their behavior according to insufficient system
resources (think media players that can drop frames for example), I think it would
be beneficial to record such events to the same trace file.

The trace file should have a portable format so that it can be freely shared between
developers. An ASCII format like we currently use should be OK, as long as it
doesn't cause too much of an effect during execution of the recording.

The playback application can be implemented via two ways.

One way, which is the LinSched way would be to have the full scheduler implementation
compiled as part of said application, and use application specific methods to evaluate
performance. While this will work, it won't allow comparison of the two hosts in a meaningful
manner.

For both scheduler and platform evaluation, the playback application will generate the load
on the running host by simulating the source host's recorded work load session.
That means emulating process activity like forks, thread spawning, blocking on resources
etc. It is not clear to me yet if that's possible without using some kind of kernel
level helper module, but not requiring such is desirable.

Since one would have the full trace of scheduling activity: past, present and future; there would
be the possibility of generating a perfect schedule (as defined by best performance, or best
power consumption), and use it as a yardstick of evaluation against the actual scheduler.
Comparing the results, you would get an estimate of the best case improvement that could be
achieved if the ideal scheduler existed.

I know this is a bit long, but I hope this can be a basis of thinking on how to go about
developing this.

Regards

-- Pantelis





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/