Re: [RFC PATCH v3 0/3] Introduce BPF map tracing capability

From: Yonghong Song
Date: Thu Nov 04 2021 - 00:24:11 EST




On 11/3/21 10:49 AM, Alexei Starovoitov wrote:
On Wed, Nov 3, 2021 at 10:45 AM Joe Burton <jevburton.kernel@xxxxxxxxx> wrote:

Sort of - I hit issues when defining the function in the same
compilation unit as the call site. For example:

static noinline int bpf_array_map_trace_update(struct bpf_map *map,
void *key, void *value, u64 map_flags)

Not quite :)
You've had this issue because of 'static noinline'.
Just 'noinline' would not have such issues even in the same file.

This seems not true. With latest trunk clang,

[$ ~/tmp2] cat t.c
int __attribute__((noinline)) foo() { return 1; }
int bar() { return foo() + foo(); }
[$ ~/tmp2] clang -O2 -c t.c
[$ ~/tmp2] llvm-objdump -d t.o

t.o: file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <foo>:
0: b8 01 00 00 00 movl $1, %eax
5: c3 retq
6: 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:(%rax,%rax)

0000000000000010 <bar>:
10: b8 02 00 00 00 movl $2, %eax
15: c3 retq
[$ ~/tmp2]

The compiler did the optimization and the original noinline function still in the binary.

With a single foo() in bar() has the same effect.

asm("") indeed helped preserve the call.

[$ ~/tmp2] cat t.c
int __attribute__((noinline)) foo() { asm(""); return 1; }
int bar() { return foo() + foo(); }
[$ ~/tmp2] clang -O2 -c t.c
[$ ~/tmp2] llvm-objdump -d t.o

t.o: file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <foo>:
0: b8 01 00 00 00 movl $1, %eax
5: c3 retq
6: 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:(%rax,%rax)

0000000000000010 <bar>:
10: 50 pushq %rax
11: e8 00 00 00 00 callq 0x16 <bar+0x6>
16: e8 00 00 00 00 callq 0x1b <bar+0xb>
1b: b8 02 00 00 00 movl $2, %eax
20: 59 popq %rcx
21: c3 retq
[$ ~/tmp2]

Note with asm(""), foo() is called twice, but the compiler optimization
knows foo()'s return value is 1 so it did calculation at compiler time,
assign the 2 to %eax and returns.

Having a single foo() in bar() has the same effect.

[$ ~/tmp2] cat t.c
int __attribute__((noinline)) foo() { return 1; }
int bar() { return foo(); }
[$ ~/tmp2] clang -O2 -c t.c
[$ ~/tmp2] llvm-objdump -d t.o

t.o: file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <foo>:
0: b8 01 00 00 00 movl $1, %eax
5: c3 retq
6: 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:(%rax,%rax)

0000000000000010 <bar>:
10: b8 01 00 00 00 movl $1, %eax
15: c3 retq
[$ ~/tmp2]

I checked with a few llvm compiler engineers in Facebook.
They mentioned there is nothing preventing compiler from doing
optimization like poking inside the noinline function and doing
some optimization based on that knowledge.


Reminder: please don't top post and trim your replies.