Re: [Patch 0/5] I/O statistics through request queues

From: Martin Peschke
Date: Thu Oct 26 2006 - 07:08:18 EST


Frank Ch. Eigler wrote:
Martin Peschke <mp3@xxxxxxxxxx> writes:

[...] The tricky question is: is event processing, that is,
statistics data aggregation, better done later (in user space), or
immediately (in the kernel). Both approaches exist: blktrace/btt vs.
gendisk statistics used by iostat, for example. [...]

I would put it one step farther: the tricky question is whether it's
worth separating marking the state change events ("request X
enqueued") from the action to be taken ("track statistics", "collect
trace records").

The reason I brought up the lttng/marker thread here was because that
suggests a way of addressing several of the problems at the same time.
This could work thusly: (This will sound familiar to OLS attendees.)

- The blktrace code would adopt a generic marker mechanism such as
that (still) evolving within the lttng/systemtap effort. These
markers would replace calls to inline functions such as
blk_add_trace_bio(q,bio,BLK_TA_QUEUE);
with something like
MARK(blk_bio_queue,q,bio);

- The blktrace code that formats and manages trace data would become a
consumer of the marker API. It would be hooked up at runtime to
these markers.

I suppose the marker approach will be adopted if jumping from a marker
to code hooked up there can be made fast and secure enough for
prominent architectures.

When the events fire, the tracing backend receiving
the callbacks could do the same thing it does now. (With the
markers dormant, the cost should not be much higher than the current
(likely (!q->blk_trace)) conditional.)

- The mp3 statistics code would be an alternate backend to these same
markers. It could be activated or deactivated on the fly (to let
another subsystem use the markers). The code would maintain statistics
in its own memory and could present the data on /proc or whatnot, the
same way as today.

Basically, I agree. But, the devel is in the details.

Dynamic instrumentation based on markers allows to grow code,
but it doesn't allow to grow data structure, AFAICS.

Statistics might require temporary results to be stored per
entity.

For example, latencies require two timestamps. The older one needs to
be stored somewhere until the second timestamp can be determined and
a latency is calculated. I would add a field a field to struct request
for this purpose.

The workaround would be to pass any intermediate result in the form
of a trace event up to user space and try to sort it out later -
which takes us back to the blktrace approach.

Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/