Re: [PATCH v2 0/5] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter

From: xiakaixu
Date: Fri Jul 24 2015 - 22:15:25 EST


ä 2015/7/24 7:33, Daniel Borkmann åé:
> On 07/22/2015 10:09 AM, Kaixu Xia wrote:
>> Previous patch v1 url:
>> https://lkml.org/lkml/2015/7/17/287
>
> [ Sorry to chime in late, just noticed this series now as I wasn't in Cc for
> the core BPF changes. More below ... ]

Sorry about this, will add you to the CC list:) Welcome your comments.
>
>> This patchset allows user read PMU events in the following way:
>> 1. Open the PMU using perf_event_open() (for each CPUs or for
>> each processes he/she'd like to watch);
>> 2. Create a BPF_MAP_TYPE_PERF_EVENT_ARRAY BPF map;
>> 3. Insert FDs into the map with some key-value mapping scheme
>> (i.e. cpuid -> event on that CPU);
>> 4. Load and attach eBPF programs as usual;
>> 5. In eBPF program, get the perf_event_map_fd and key (i.e.
>> cpuid get from bpf_get_smp_processor_id()) then use
>> bpf_perf_event_read() to read from it.
>> 6. Do anything he/her want.
>>
>> changes in V2:
>> - put atomic_long_inc_not_zero() between fdget() and fdput();
>> - limit the event type to PERF_TYPE_RAW and PERF_TYPE_HARDWARE;
>> - Only read the event counter on current CPU or on current
>> process;
>> - add new map type BPF_MAP_TYPE_PERF_EVENT_ARRAY to store the
>> pointer to the struct perf_event;
>> - according to the perf_event_map_fd and key, the function
>> bpf_perf_event_read() can get the Hardware PMU counter value;
>>
>> Patch 5/5 is a simple example and shows how to use this new eBPF
>> programs ability. The PMU counter data can be found in
>> /sys/kernel/debug/tracing/trace(trace_pipe).(the cycles PMU
>> value when 'kprobe/sys_write' sampling)
>>
>> $ cat /sys/kernel/debug/tracing/trace_pipe
>> $ ./tracex6
>> ...
>> cat-677 [002] d..1 210.299270: : bpf count: CPU-2 5316659
>> cat-677 [002] d..1 210.299316: : bpf count: CPU-2 5378639
>> cat-677 [002] d..1 210.299362: : bpf count: CPU-2 5440654
>> cat-677 [002] d..1 210.299408: : bpf count: CPU-2 5503211
>> cat-677 [002] d..1 210.299454: : bpf count: CPU-2 5565438
>> cat-677 [002] d..1 210.299500: : bpf count: CPU-2 5627433
>> cat-677 [002] d..1 210.299547: : bpf count: CPU-2 5690033
>> cat-677 [002] d..1 210.299593: : bpf count: CPU-2 5752184
>> cat-677 [002] d..1 210.299639: : bpf count: CPU-2 5814543
>> <...>-548 [009] d..1 210.299667: : bpf count: CPU-9 605418074
>> <...>-548 [009] d..1 210.299692: : bpf count: CPU-9 605452692
>> cat-677 [002] d..1 210.299700: : bpf count: CPU-2 5896319
>> <...>-548 [009] d..1 210.299710: : bpf count: CPU-9 605477824
>> <...>-548 [009] d..1 210.299728: : bpf count: CPU-9 605501726
>> <...>-548 [009] d..1 210.299745: : bpf count: CPU-9 605525279
>> <...>-548 [009] d..1 210.299762: : bpf count: CPU-9 605547817
>> <...>-548 [009] d..1 210.299778: : bpf count: CPU-9 605570433
>> <...>-548 [009] d..1 210.299795: : bpf count: CPU-9 605592743
>> ...
>>
>> The detail of patches is as follow:
>>
>> Patch 1/5 introduces a new bpf map type. This map only stores the
>> pointer to struct perf_event;
>>
>> Patch 2/5 introduces a map_traverse_elem() function for further use;
>>
>> Patch 3/5 convets event file descriptors into perf_event structure when
>> add new element to the map;
>
> So far all the map backends are of generic nature, knowing absolutely nothing
> about a particular consumer/subsystem of eBPF (tc, socket filters, etc). The
> tail call is a bit special, but nevertheless generic for each user and [very]
> useful, so it makes sense to inherit from the array map and move the code there.
>
> I don't really like that we start add new _special_-cased maps here into the
> eBPF core code, it seems quite hacky. :( From your rather terse commit description
> where you introduce the maps, I failed to see a detailed elaboration on this i.e.
> why it cannot be abstracted any different?

It will be very useful that giving the eBPF programs the ablility to access
hardware PMU counter, just as I mentioned in V1 commit message.
Of course, there are some special code when creating the perf_event type map
in V2, but you will find less special code in the next version(V3). I have
reused most of the prog_array map implementation. We can make the perf_event
array map more generic in the future.

BR.
>
>> Patch 4/5 implement function bpf_perf_event_read() that get the selected
>> hardware PMU conuter;
>>
>> Patch 5/5 give a simple example.
>>
>> Kaixu Xia (5):
>> bpf: Add new bpf map type to store the pointer to struct perf_event
>> bpf: Add function map->ops->map_traverse_elem() to traverse map elems
>> bpf: Save the pointer to struct perf_event to map
>> bpf: Implement function bpf_perf_event_read() that get the selected
>> hardware PMU conuter
>> samples/bpf: example of get selected PMU counter value
>>
>> include/linux/bpf.h | 6 +++
>> include/linux/perf_event.h | 5 ++-
>> include/uapi/linux/bpf.h | 3 ++
>> kernel/bpf/arraymap.c | 110 +++++++++++++++++++++++++++++++++++++++++++++
>> kernel/bpf/helpers.c | 42 +++++++++++++++++
>> kernel/bpf/syscall.c | 26 +++++++++++
>> kernel/events/core.c | 30 ++++++++++++-
>> kernel/trace/bpf_trace.c | 2 +
>> samples/bpf/Makefile | 4 ++
>> samples/bpf/bpf_helpers.h | 2 +
>> samples/bpf/tracex6_kern.c | 27 +++++++++++
>> samples/bpf/tracex6_user.c | 67 +++++++++++++++++++++++++++
>> 12 files changed, 321 insertions(+), 3 deletions(-)
>> create mode 100644 samples/bpf/tracex6_kern.c
>> create mode 100644 samples/bpf/tracex6_user.c
>>
>
>
> .
>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/