Re: [RFC PATCH 1/3] Unified trace buffer

From: Martin Bligh
Date: Wed Sep 24 2008 - 12:51:00 EST

Next message: Mathieu Desnoyers: "Re: [RFC PATCH 1/3] Unified trace buffer"
Previous message: David Howells: "[PATCH 2/2] MN10300: Make sched_clock() report time since boot"
In reply to: Linus Torvalds: "Re: [RFC PATCH 1/3] Unified trace buffer"
Next in thread: Linus Torvalds: "Re: [RFC PATCH 1/3] Unified trace buffer"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

>> I'm not sure why this is any harder to deal with in write, than it is
>> in reserve? We should be able to make reserve handle this just
>> as well?
>
> No, imagine the mentioned case where we're straddling a page boundary.
>
> A----| |----B
> ^------|
>
> So when we reserve we get a pointer into page A, but our reserve length
> will run over into page B. A write() method will know how to check for
> this and break up the memcpy to copy up-to the end of A and continue
> into B.
>
> You cannot expect the reserve/commit interface users to do this
> correctly - it would also require one to expose too much internals,
> you'd need to be able to locate page B for starters.

Can't the reserve interface just put a padding event into page A,
or otherwise mark it, and return the start of page B?

>> If you use write rather than reserve, you have to copy all the data
>> twice for every event.
>
> Well, once. I'm not seeing where the second copy comes from.

Depends how you count ;-) One more time than you would have to
with reserve - the temporarily packed structure doesn't exist.

>> > On top of that foundation build an eventbuffer, which knows about
>> > encoding/decoding/printing events.
>> >
>> > This too needs to be a flexible layer -
>>
>> That would be nice. However, we need to keep at least the length
>> and timestamp fields common so we can do parsing and the mergesort?
>
> And here I was thinking you guys bit encoded the event id into the
> timestamp delta :-)

+/* header plus 32-bits of event data */
+struct ktrace_entry {
+ u32 event_type:5, tsc_shifted:27;
+ u32 data;
+};

was our basic data type. So ... sort of ;-)

>> So type would move into the body here?
>
> All of it would, basically I have no notion of an event in the
> ringbuffer API. You write $something and your read routine would need to
> be smart enough to figure it out.

If you don't have timestamps, you need domain-specific context to merge
the per-cpu buffers back together. As long as these are common format
amongst all the event-level alternatives, I guess it doesn't matter.

> Another option is to start out with a fixed sized header that contains a
> length field.

That's what we discussed at KS/plumbers, and seems like the simplest
option by far to start with.

> But the raw ringbuffer layer, the one concerned with fiddling the pages
> and writing/reading thereto need not be aware of anything else.

When you loop around the ringbuffer, you need to shift the starting "read"
pointer up to the next event, don't you? How do you do that to start on
a whole event without knowing the event size?

> Exactly - which is why a flexible encoding layer makes sense to me -
> aside from the abstraction itself.

I like the abstraction, yes ;-) Just not convinced how much we can put in it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Mathieu Desnoyers: "Re: [RFC PATCH 1/3] Unified trace buffer"
Previous message: David Howells: "[PATCH 2/2] MN10300: Make sched_clock() report time since boot"
In reply to: Linus Torvalds: "Re: [RFC PATCH 1/3] Unified trace buffer"
Next in thread: Linus Torvalds: "Re: [RFC PATCH 1/3] Unified trace buffer"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]