Re: [RFC PATCH v2] bpf: Using binary search to improve the performance of btf_find_by_name_kind

From: pengdonglin
Date: Sun Sep 17 2023 - 05:09:26 EST


On 2023/9/15 1:14, Alan Maguire wrote:
On 14/09/2023 14:05, pengdonglin wrote:
On 2023/9/14 20:46, Alan Maguire wrote:
On 14/09/2023 11:13, pengdonglin wrote:
On 2023/9/13 21:34, Alan Maguire wrote:
On 13/09/2023 11:32, pengdonglin wrote:
On 2023/9/13 2:46, Alexei Starovoitov wrote:
On Tue, Sep 12, 2023 at 10:03 AM Eduard Zingerman <eddyz87@xxxxxxxxx>
wrote:

On Tue, 2023-09-12 at 09:40 -0700, Alexei Starovoitov wrote:
On Tue, Sep 12, 2023 at 7:19 AM Eduard Zingerman
<eddyz87@xxxxxxxxx>
wrote:

On Tue, 2023-09-12 at 16:51 +0300, Eduard Zingerman wrote:
On Sat, 2023-09-09 at 02:16 -0700, Donglin Peng wrote:
Currently, we are only using the linear search method to find
the
type id
by the name, which has a time complexity of O(n). This change
involves
sorting the names of btf types in ascending order and using
binary search,
which has a time complexity of O(log(n)). This idea was inspired
by the
following patch:

60443c88f3a8 ("kallsyms: Improve the performance of
kallsyms_lookup_name()").

At present, this improvement is only for searching in vmlinux's
and
module's BTFs, and the kind should only be BTF_KIND_FUNC or
BTF_KIND_STRUCT.

Another change is the search direction, where we search the BTF
first and
then its base, the type id of the first matched btf_type will be
returned.

Here is a time-consuming result that finding all the type ids of
67,819 kernel
functions in vmlinux's BTF by their names:

Before: 17000 ms
After:     10 ms

The average lookup performance has improved about 1700x at the
above scenario.

However, this change will consume more memory, for example,
67,819 kernel
functions will allocate about 530KB memory.

Hi Donglin,

I think this is a good improvement. However, I wonder, why did
you
choose to have a separate name map for each BTF kind?

I did some analysis for my local testing kernel config and got
such numbers:
- total number of BTF objects: 97350
- number of FUNC and STRUCT objects: 51597
- number of FUNC, STRUCT, UNION, ENUM, ENUM64, TYPEDEF, DATASEC
objects: 56817
     (these are all kinds for which lookup by name might make
sense)
- number of named objects: 54246
- number of name collisions:
     - unique names: 53985 counts
     - 2 objects with the same name: 129 counts
     - 3 objects with the same name: 3 counts

So, it appears that having a single map for all named objects
makes
sense and would also simplify the implementation, what do you
think?

Some more numbers for my config:
- 13241 types (struct, union, typedef, enum), log2 13241 = 13.7
- 43575 funcs, log2 43575 = 15.4
Thus, having separate map for types vs functions might save ~1.7
search iterations. Is this a significant slowdown in practice?

What do you propose to do in case of duplicates ?
func and struct can have the same name, but they will have two
different
btf_ids. How do we store them ?
Also we might add global vars to BTF. Such request came up several
times.
So we need to make sure our search approach scales to
func, struct, vars. I don't recall whether we search any other
kinds.
Separate arrays for different kinds seems ok.
It's a bit of code complexity, but it's not an increase in memory.

Binary search gives, say, lowest index of a thing with name A, then
increment index while name remains A looking for correct kind.
Given the name conflicts info from above, 99% of times there
would be
no need to iterate and in very few cases there would a couple of
iterations.

Same logic would be necessary with current approach if different BTF
kinds would be allowed in BTF_ID_NAME_* cohorts. I figured that
these
cohorts are mainly a way to split the tree for faster lookups, but
maybe that is not the main intent.

With 13k structs and 43k funcs it's 56k * (4 + 4) that's 0.5 Mbyte
extra memory. That's quite a bit. Anything we can do to compress
it?

That's an interesting question, from the top of my head:
pre-sort in pahole (re-assign IDs so that increasing ID also would
mean "increasing" name), shouldn't be that difficult.

That sounds great. kallsyms are pre-sorted at build time.
We should do the same with BTF.
I think GCC can emit BTF directly now and LLVM emits it for bpf progs
too,
but since vmlinux and kernel module BTFs will keep being processed
through pahole we don't have to make gcc/llvm sort things right away.
pahole will be enough. The kernel might do 'is it sorted' check
during BTF validation and then use binary search or fall back to
linear
when not-sorted == old pahole.


Yeah, I agree and will attempt to modify the pahole and perform a
test.
Do we need
to introduce a new macro to control the behavior when the BTF is not
sorted? If
it is not sorted, we can use the method mentioned in this patch or use
linear
search.



One challenge with pahole is that it often runs in parallel mode, so I
suspect any sorting would have to be done after merging across threads.
Perhaps BTF deduplication time might be a useful time to re-sort by
name? BTF dedup happens after BTF has been merged, and a new "sorted"
btf_dedup_opts option could be added and controlled by a pahole
option. However dedup is pretty complicated already..

One thing we should weigh up though is if there are benefits to the
way BTF is currently laid out. It tends to start with base types,
and often-encountered types end up being located towards the start
of the BTF data. For example


[1] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64
encoding=(none)
[2] CONST '(anon)' type_id=1
[3] VOLATILE '(anon)' type_id=1
[4] ARRAY '(anon)' type_id=1 index_type_id=21 nr_elems=2
[5] PTR '(anon)' type_id=8
[6] CONST '(anon)' type_id=5
[7] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=SIGNED
[8] CONST '(anon)' type_id=7
[9] INT 'unsigned int' size=4 bits_offset=0 nr_bits=32 encoding=(none)
[10] CONST '(anon)' type_id=9
[11] TYPEDEF '__s8' type_id=12
[12] INT 'signed char' size=1 bits_offset=0 nr_bits=8 encoding=SIGNED
[13] TYPEDEF '__u8' type_id=14

So often-used types will be found quickly, even under linear search
conditions.

I found that there seems to be no code in the kernel that get the ID
of the
basic data type by calling btf_find_by_name_kind directly. The general
usage
of this function is to obtain the ID of a structure or function. After
we got
the ID of a structure or function, it is O(1) to get the IDs of its
members
or parameters.

./kernel/trace/trace_probe.c:383:       id = btf_find_by_name_kind(btf,
funcname, BTF_KIND_FUNC);
./kernel/bpf/btf.c:3523:        id = btf_find_by_name_kind(btf,
value_type, BTF_KIND_STRUCT);
./kernel/bpf/btf.c:5504:                id = btf_find_by_name_kind(btf,
alloc_obj_fields[i], BTF_KIND_STRUCT);
./kernel/bpf/bpf_struct_ops.c:128:      module_id =
btf_find_by_name_kind(btf, "module", BTF_KIND_STRUCT);
./net/ipv4/bpf_tcp_ca.c:28:     type_id = btf_find_by_name_kind(btf,
"sock", BTF_KIND_STRUCT);
./net/ipv4/bpf_tcp_ca.c:33:     type_id = btf_find_by_name_kind(btf,
"tcp_sock", BTF_KIND_STRUCT);
./net/netfilter/nf_bpf_link.c:181:      type_id =
btf_find_by_name_kind(btf, name, BTF_KIND_STRUCT);


When we look at how many lookups by id (which are O(1), since they are
done via the btf->types[] array) versus by name, we see:

$ grep btf_type_by_id kernel/bpf/*.c|wc -l
120
$ grep btf_find_by_nam kernel/bpf/*.c|wc -l
15

I don't see a huge number of name-based lookups, and I think most are
outside of the hotter codepaths, unless I'm missing some. All of which
is to say it would be a good idea to have a clear sense of what will
get
faster with sorted-by-name BTF. Thanks!

The story goes like this.

I have added a new feature to the function graph called
"funcgraph_retval",
here is the link:

https://lore.kernel.org/all/1fc502712c981e0e6742185ba242992170ac9da8.1680954589.git.pengdonglin@xxxxxxxxxxxxxx/

We can obtain the return values of almost every function during the
execution
of kernel through this feature, it can help us analyze problems.


It's a great feature!

Thanks.


However, this feature has two main drawbacks.

1. Even if a function's return type is void,  a return value will still
be printed.

2. The return value printed may be incorrect when the width of the
return type is
smaller than the generic register.

I think if we can get this return type of the function, then the
drawbacks mentioned
above can be eliminated. The function btf_find_by_name_kind can be used
to get the ID of
the kernel function, then we can get its return type easily. If the
return type is
void, the return value recorded will not be printed. If the width of the
return type
is smaller than the generic register, then the value stored in the upper
bits will be
trimmed. I have written a demo and these drawbacks were resolved.

However, during my test, I found that it took too much time when read
the trace log
with this feature enabled, because the trace log consists of 200,000
lines. The
majority of the time was consumed by the btf_find_by_name_kind, which is
called
200,000 times.

So I think the performance of btf_find_by_name_kind  may need to be
improved.


If I recall, Masami's work uses BTF ids, but can cache them since the
user explicitly asks for specific fields in the trace output. I'm
presuming that's not an option for you due to the fact funcgraph tracing
enables everything (or at least everything under a filter predicate) and
you have limited context to work with, is that right?

Yes, right.


Looking at print_graph_entry_leaf() which I _think_ is where you'd need
to print the retval from, you have access to the function address via
call->func, and I presume you get the name from snprinting the symbol to
a string or similar. So you're stuck in a context where you have the
function address, and from that you can derive the function name. Is
that correct? Thanks!

Yes, both print_graph_return and print_graph_entry_leaf will call
print_graph_retval
to print the return value. Then call sprint_symbol_no_offset with
call->func to get
the function name, then call btf_find_by_name_kind to get the return type.


So what you ultimately need is a way to map from that information
available to be able to figure out the size of the return value
associated with a function.

On the BPF side we've been thinking a bit about the relationship between
kernel function addresses and their BTF representations; sometimes
knowing BTF->address mapping is needed for cases where the same function
name has multiple inconsistent function signatures in BTF. We don't
represent function addresses yet in BTF but may end up having to.
The reason I mention this is in an ideal world, it would benefit to
populate kallsyms entries with their associated BTF ids; then a
function would need to be looked up once in kallsyms; that lookup would
benefit from recent speedups, and if it contained the associated BTF id
we'd have an O(1) lookup from the BTF id -> function. Not sure if that
would be tenable from the kallsyms side, but I just wanted to mention
it, as from the above it seems an address-based lookup is a possibility
to solve the return value type lookup problem that you're facing.

Cc'ed Jiri who had to wrestle with kallsyms for the kprobe multi stuff.
Would the above help do you think?

Thank you, but I tend to agree with Alexei. Using kallsyms will consume extra
memory too, but it can only be used for functions. I have done a test using
kallsyms_lookup + optimized btf_find_by_name_kind, and the time consumed
was not excessive, it took about 38ms to find the IDs of 67823 kernel
functions. If using a new map for IDs and function addresses, it took
about 5ms.


Thanks!

Alan