Re: ftrace global trace_pipe_raw

From: Steven Rostedt
Date: Wed Dec 19 2018 - 11:37:39 EST


On Wed, 19 Dec 2018 12:32:41 +0100
Claudio <claudio.fontana@xxxxxxxxx> wrote:

> >>
> >> I would imagine the core functionality is already available, since trace_pipe
> >> in the tracing directory already shows all events regardless of CPU, and so
> >> it would be a matter of doing the same for trace_pipe_raw.
> >
> > The difference between trace_pipe and trace_pipe_raw is that trace_pipe
> > is post processed, and reads the per CPU buffers and interleaves them
> > one event at a time. The trace_pipe_raw just sends you the raw
> > unprocessed data directly from the buffers, which are grouped per CPU.
>
> I think that what I am looking for, to improve the performance of our system,
> is a post processed stream of binary entry data, already merged from all CPUs
> and sorted per timestamp, in the same way that it is done for textual output
> in __find_next_entry:
>
> for_each_tracing_cpu(cpu) {
>
> if (ring_buffer_empty_cpu(buffer, cpu))
> continue;
>
> ent = peek_next_entry(iter, cpu, &ts, &lost_events);
>
> /*
> * Pick the entry with the smallest timestamp:
> */
> if (ent && (!next || ts < next_ts)) {
> next = ent;
> next_cpu = cpu;
> next_ts = ts;
> next_lost = lost_events;
> next_size = iter->ent_size;
> }
> }
>
> We first tried to use the textual output directly, but this lead to
> unacceptable overheads in parsing the text.
>
> Please correct me if I do not understand, however it seems to me that it
> would be possible do the same kind of post processing including generating
> a sorted stream of entries, just avoiding the text output formatting,
> and outputting the binary data of the entry directly, which would be way
> more efficient to consume directly from user space correlators.
>
> But maybe this is not a general enough requirement to be acceptable for
> implementing directly into the kernel?
>
> We have the requirement of using the OS tracing events, including
> scheduling events, to react from software immediately
> (vs doing after-the-fact analysis).

Have you looked at using the perf event interface? I believe it uses a
single buffer for all events. At least for tracing a single process.

-- Steve