Re: [RFC/PATCH 03/38] perf tools: Move auxtrace_mmap field to struct perf_evlist

From: Namhyung Kim
Date: Fri Oct 09 2015 - 03:59:09 EST


Hi Adrian,

On Thu, Oct 08, 2015 at 07:07:43PM +0300, Adrian Hunter wrote:
> On 7/10/2015 12:06 p.m., Namhyung Kim wrote:
> >Hi Adrian,
> >
> >On Tue, Oct 6, 2015 at 6:26 PM, Adrian Hunter <adrian.hunter@xxxxxxxxx> wrote:
> >>On 06/10/15 12:03, Namhyung Kim wrote:
> >>>Hi Adrian,
> >>>
> >>>On Mon, Oct 5, 2015 at 8:29 PM, Adrian Hunter <adrian.hunter@xxxxxxxxx> wrote:
> >>>>On 02/10/15 21:45, Arnaldo Carvalho de Melo wrote:
> >>>>>Em Fri, Oct 02, 2015 at 02:18:44PM +0900, Namhyung Kim escreveu:
> >>>>>>Since it's gonna share struct mmap with dummy tracking evsel to track
> >>>>>>meta events only, let's move auxtrace out of struct perf_mmap.
> >>>>>Is this moving around _strictly_ needed?
> >>>>
> >>>>Also, what if you wanted to capture AUX data and tracking together.
> >>>
> >>>Hmm.. I don't know what's the problem. It should be orthogonal and
> >>>support doing that together IMHO. Maybe I'm missing something about
> >>>the aux data processing and Intel PT. I'll take a look at it..
> >>>
> >>
> >>It is only orthogonal if you assume we will never want to support parallel
> >>processing with Intel PT.
> >
> >We'll definitely want it. :)
> >
> >>
> >>The only change that needs to be made is not to assume there is only 1
> >>tracking event.
>
> Sorry for the slow reply.

No problem at all. JFYI I'm travelling now.. :)


>
> >
> >IIUC Intel PT (and BTS?) needs maximum 2 dummy events - one is to
> >track task/mmap and another is to track context switches. The latter
> >is basically a light-weight version of the sched_switch event, right?
>
> Yes
>
> >
> >For parallel processing, each cpu needs to keep current thread to
> >synthesize events from auxtrace data. So if it processed the switch
> >events before processing samples, it'd need to build long lists of
> >current thread per cpu. IMHO it'd be better to process the switch
> >events with samples using multi-thread rather than processing them
> >prior to samples.
>
> That is a good point.
>
> But that would be limited to dividing the data by cpu. It would be more
> useful to divide it any which way. Does 'perf report' care if the
> data is not in order?

It doesn't as long as it could find a correct thread/dso/symbol ...

Btw I thought it'd also work if the targets are tasks since it'd still
be able to follow context switches of the tasks as switch events are
recorded along with the auxtrace events per task, no?

>
> >So how about this? It'd use *always* 2 dummy (or 1 dummy + 1
> >sched_switch) events. The tracking dummy events would be recorded on
> >the tracking mmaps and switch (dummy) event would be recorded on the
> >main mmaps. This way we can parallelize the auxtrace processing
> >without the list of current thread IMHO.
> >
> >Do I miss something?
>
> Thinking about it now, it would probably make sense to put the AUX
> event with the tracking events as well, so the data can be queued up
> ready for processing, then the AUX index would not be needed. But of
> course, if there were no other events, then there would be no main
> mmap at all.

Hmm.. let me try to follow. :)

So we can have 3 types of mmap in this case:

1. track mmap for task/mmap events - it'll be saved in a separate
file (in the meantime).
2. main mmap for samples - it'll be saved in per-index (cpu or task)
file. For Intel PT, the switch events will be saved here too.
3. auxtrace mmap - it'll be saved in per-index file (with switch events).

>
> From that point of view, I guess I don't need to worry about splitting
> up the mmaps at all, just process them more than once if need be.

OK. I don't follow.. Can you elaborate it more? Do you think it's not
necessary to use two dummy events? What can be processed more than
once?

>
> >
> >>
> >>IMHO there could be separate mmap_params also, which would allow for
> >>different mmap sizes for the tracking and main mmaps.
> >
> >Currently, the tracking mmap size is fixed at an arbitrary size
> >(128KiB) regardless of the main mmaps. I can add an option to change
> >the tracking mmap size too.
>
> I meant more from the program point of view, to allow different parameters.
> Such as allowing one mmap to be PROT_READ and the other PROT_READ|PROT_WRITE
> i.e. collect all the tracking events but let the other events overwrite
> - perhaps as some kind of snapshot mode like we do with Intel PT.

Ah, I see.

>
> It seemed to me that it would be more flexible to put evsels into mmap
> groups. Then those groups could have any events or be used in various ways.
> I also thought it might make the mmap code more readable, instead of having
> lots of "if tracking event do something different".

Hmm.. good idea. I'll think about it.

>
> On the other hand, it is just a thought. As I mentioned above, I realized
> I could probably manage without splitting the mmaps.

It'd be nice if you'd explain your thoughts in more detail.

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/