Re: [PATCH 4/4] perf core: Add backward attribute to perf event

From: Wangnan (F)
Date: Tue Mar 29 2016 - 22:39:47 EST




On 2016/3/30 10:28, Wangnan (F) wrote:


On 2016/3/29 22:04, Peter Zijlstra wrote:
On Mon, Mar 28, 2016 at 06:41:32AM +0000, Wang Nan wrote:

Could you maybe write a perf/tests thingy for this so that _some_
userspace exists that exercises this new code?


int perf_output_begin(struct perf_output_handle *handle,
struct perf_event *event, unsigned int size)
{
+ if (unlikely(is_write_backward(event)))
+ return __perf_output_begin(handle, event, size, true);
return __perf_output_begin(handle, event, size, false);
}
Would something like:

int perf_output_begin(...)
{
if (unlikely(is_write_backward(event))
return perf_output_begin_backward(...);
return perf_output_begin_forward(...);
}

make sense; I'm not sure how much is still using this, but it seems
somewhat excessive to inline two copies of that thing into a single
function.

perf_output_begin is useful:

$ grep perf_output_begin ./kernel -r
./kernel/events/ring_buffer.c: * See perf_output_begin().
./kernel/events/ring_buffer.c:int perf_output_begin(struct perf_output_handle *handle,
./kernel/events/ring_buffer.c: * perf_output_begin() only checks rb->paused, therefore
./kernel/events/core.c: if (perf_output_begin(&handle, event, header.size))
./kernel/events/core.c: ret = perf_output_begin(&handle, event, read_event.header.size);
./kernel/events/core.c: ret = perf_output_begin(&handle, event,
./kernel/events/core.c: ret = perf_output_begin(&handle, event,
./kernel/events/core.c: ret = perf_output_begin(&handle, event,
./kernel/events/core.c: ret = perf_output_begin(&handle, event, rec.header.size);
./kernel/events/core.c: ret = perf_output_begin(&handle, event,
./kernel/events/core.c: ret = perf_output_begin(&handle, event, se->event_id.header.size);
./kernel/events/core.c: ret = perf_output_begin(&handle, event,
./kernel/events/core.c: ret = perf_output_begin(&handle, event, rec.header.size);

Events like PERF_RECORD_MMAP2 uses this function, so we still need to consider its overhead.

So I will use your first suggestion.


Sorry. Your second suggestion seems also good:

My implementation makes a big perf_output_begin(), but introduces only one load and one branch.

Your first suggestion introduces one load, one branch and one function call.

Your second suggestion introduces one load, and at least one (and at most three) branches.

I need some benchmarking result.

Thank you.