Re: [RFC PATCH v4 10/29] bpf tools: Collect map definitions from 'maps' section

From: Wangnan (F)
Date: Thu May 28 2015 - 03:15:50 EST




On 2015/5/28 14:09, Alexei Starovoitov wrote:
On Thu, May 28, 2015 at 11:09:50AM +0800, Wangnan (F) wrote:
However this breaks a law in current design that opening phase doesn't
talk to kernel with sys_bpf() at all. All related staff is done in loading
phase. This principle ensures that in every systems, no matter it support
sys_bpf() or not, can read eBPF object without failure.
I see, so you want 'parse elf' and 'create maps + load programs'
to be separate phases?
Fair enough. Then please add a call to release the information
collected from elf after program loading is done.
relocations and other things are not needed at that point.

What about appending a flag into bpf_object__load() to let it know
whether to cleanup resource it taken or not? for example:

int bpf_object__load(struct bpf_object *obj, bool clean);

then we can further wrap it by a macro:

#define bpf_object__load_clean(o) bpf_object__load(o, true)

If 'clear' is true, after loading resources will be freed, and the same
object will be unable to reload again after unload. B doing this we can
avoid adding a new function.

Moreover, we are planning to introduce hardware PMU to eBPF in the way like
maps,
to give eBPF programs the ability to access hardware PMU counter. I haven't
that's very interesting. Please share more info when you can :)
If I understood it right, you want in-kernel bpf to do aggregation
and filtering of pmu counters ?
And computing a number of cache misses between two kprobe events?
I can see how I can use that to measure not only time
taken by syscall, but number of cache misses occurred due
to syscall. Sounds very useful!

I'm glad to see you are also interested with it.

Of course, filtering and aggregation based on PMU counter will be useful, but
this is only our first goal.

You know there are many useful PMU provided by x86 and ARM64. Many people ask
me if there is a way to record absolute PMU counter value when sampling, so
they can measure IPC changing, cache miss rate, page faults and so on.
Currently 'perf state' is able to read PMU counter, but the cost is
relatively high.

For me, enable eBPF program to read PMU counter is the first thing need to be done.
The other thing is enabling eBPF programs to bring some information to perf sample.

Here is an example to show my idea.

I have a program which:

int main()
{
while(1) {
read(...);
/* do A */
write(...);
/* do B */
}
}

Then by using following script:

SEC("enter=sys_write $outdata:u64")
int enter_sys_write(...) {
u64 cycles_cnt = bpf_read_pmu(&cycles_pmu);
bpf_store_value(cycles_cnt);
return 1;
}

SEC("enter=sys_read $outdata:u64")
int enter_sys_read(...) {
u64 cycles_cnt = bpf_read_pmu(&cycles_pmu);
bpf_store_value(cycles_cnt);
return 1;
}

by 'perf script', we can check the counter of cycles at each points, then we are allowed
to compute the number of cycles between any two sampling points. This way we can compute
how many cycles taken by A and B. If instruction counter is also recorded, we will know
the IPC of A and B.

Above is still a casual idea. Currently I focus on bring eBPF to perf. This should
be the base for all other interesting stuffs. However, I'm glad to see people discuss
with it.

Thank you.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/