Re: [PATCH bpf-next v5 4/4] selftests/bpf: Add a selftest for the tracing bpf_get_socket_cookie

From: Yonghong Song
Date: Sat Jan 23 2021 - 15:47:17 EST




On 1/22/21 7:34 AM, Florent Revest wrote:
On Wed, Jan 20, 2021 at 8:06 PM Florent Revest <revest@xxxxxxxxxxxx> wrote:

On Wed, Jan 20, 2021 at 8:04 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:

On Wed, Jan 20, 2021 at 9:08 AM KP Singh <kpsingh@xxxxxxxxxx> wrote:

On Tue, Jan 19, 2021 at 5:00 PM Florent Revest <revest@xxxxxxxxxxxx> wrote:

This builds up on the existing socket cookie test which checks whether
the bpf_get_socket_cookie helpers provide the same value in
cgroup/connect6 and sockops programs for a socket created by the
userspace part of the test.

Adding a tracing program to the existing objects requires a different
attachment strategy and different headers.

Signed-off-by: Florent Revest <revest@xxxxxxxxxxxx>

Acked-by: KP Singh <kpsingh@xxxxxxxxxx>

(one minor note, doesn't really need fixing as a part of this though)

---
.../selftests/bpf/prog_tests/socket_cookie.c | 24 +++++++----
.../selftests/bpf/progs/socket_cookie_prog.c | 41 ++++++++++++++++---
2 files changed, 52 insertions(+), 13 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/socket_cookie.c b/tools/testing/selftests/bpf/prog_tests/socket_cookie.c
index 53d0c44e7907..e5c5e2ea1deb 100644
--- a/tools/testing/selftests/bpf/prog_tests/socket_cookie.c
+++ b/tools/testing/selftests/bpf/prog_tests/socket_cookie.c
@@ -15,8 +15,8 @@ struct socket_cookie {

void test_socket_cookie(void)
{
+ struct bpf_link *set_link, *update_sockops_link, *update_tracing_link;
socklen_t addr_len = sizeof(struct sockaddr_in6);
- struct bpf_link *set_link, *update_link;
int server_fd, client_fd, cgroup_fd;
struct socket_cookie_prog *skel;
__u32 cookie_expected_value;
@@ -39,15 +39,21 @@ void test_socket_cookie(void)
PTR_ERR(set_link)))
goto close_cgroup_fd;

- update_link = bpf_program__attach_cgroup(skel->progs.update_cookie,
- cgroup_fd);
- if (CHECK(IS_ERR(update_link), "update-link-cg-attach", "err %ld\n",
- PTR_ERR(update_link)))
+ update_sockops_link = bpf_program__attach_cgroup(
+ skel->progs.update_cookie_sockops, cgroup_fd);
+ if (CHECK(IS_ERR(update_sockops_link), "update-sockops-link-cg-attach",
+ "err %ld\n", PTR_ERR(update_sockops_link)))
goto free_set_link;

+ update_tracing_link = bpf_program__attach(
+ skel->progs.update_cookie_tracing);
+ if (CHECK(IS_ERR(update_tracing_link), "update-tracing-link-attach",
+ "err %ld\n", PTR_ERR(update_tracing_link)))
+ goto free_update_sockops_link;
+
server_fd = start_server(AF_INET6, SOCK_STREAM, "::1", 0, 0);
if (CHECK(server_fd < 0, "start_server", "errno %d\n", errno))
- goto free_update_link;
+ goto free_update_tracing_link;

client_fd = connect_to_fd(server_fd, 0);
if (CHECK(client_fd < 0, "connect_to_fd", "errno %d\n", errno))
@@ -71,8 +77,10 @@ void test_socket_cookie(void)
close(client_fd);
close_server_fd:
close(server_fd);
-free_update_link:
- bpf_link__destroy(update_link);
+free_update_tracing_link:
+ bpf_link__destroy(update_tracing_link);

I don't think this need to block submission unless there are other
issues but the
bpf_link__destroy can just be called in a single cleanup label because
it handles null or
erroneous inputs:

int bpf_link__destroy(struct bpf_link *link)
{
int err = 0;

if (IS_ERR_OR_NULL(link))
return 0;
[...]

+1 to KP's point.

Also Florent, how did you test it?
This test fails in CI and in my manual run:
./test_progs -t cook
libbpf: load bpf program failed: Permission denied
libbpf: -- BEGIN DUMP LOG ---
libbpf:
; int update_cookie_sockops(struct bpf_sock_ops *ctx)
0: (bf) r6 = r1
; if (ctx->family != AF_INET6)
1: (61) r1 = *(u32 *)(r6 +20)
; if (ctx->family != AF_INET6)
2: (56) if w1 != 0xa goto pc+21
R1_w=inv10 R6_w=ctx(id=0,off=0,imm=0) R10=fp0
; if (ctx->op != BPF_SOCK_OPS_TCP_CONNECT_CB)
3: (61) r1 = *(u32 *)(r6 +0)
; if (ctx->op != BPF_SOCK_OPS_TCP_CONNECT_CB)
4: (56) if w1 != 0x3 goto pc+19
R1_w=inv3 R6_w=ctx(id=0,off=0,imm=0) R10=fp0
; if (!ctx->sk)
5: (79) r1 = *(u64 *)(r6 +184)
; if (!ctx->sk)
6: (15) if r1 == 0x0 goto pc+17
R1_w=sock(id=0,ref_obj_id=0,off=0,imm=0) R6_w=ctx(id=0,off=0,imm=0) R10=fp0
; p = bpf_sk_storage_get(&socket_cookies, ctx->sk, 0, 0);
7: (79) r2 = *(u64 *)(r6 +184)
; p = bpf_sk_storage_get(&socket_cookies, ctx->sk, 0, 0);
8: (18) r1 = 0xffff888106e41400
10: (b7) r3 = 0
11: (b7) r4 = 0
12: (85) call bpf_sk_storage_get#107
R2 type=sock_or_null expected=sock_common, sock, tcp_sock, xdp_sock, ptr_
processed 12 insns (limit 1000000) max_states_per_insn 0 total_states
0 peak_states 0 mark_read 0

libbpf: -- END LOG --
libbpf: failed to load program 'update_cookie_sockops'
libbpf: failed to load object 'socket_cookie_prog'
libbpf: failed to load BPF skeleton 'socket_cookie_prog': -4007
test_socket_cookie:FAIL:socket_cookie_prog__open_and_load skeleton
open_and_load failed
#95 socket_cookie:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

Oh :| I must have missed something in the rebase, I will fix this and
address KP's comment then. Thanks for the review and sorry for the
waste of time :)

So this is actually an interesting one I think. :) The failure was
triggered by the combination of an LLVM update and this change:

-#include <linux/bpf.h>
+#include "vmlinux.h"

With an older LLVM, this used to work.
With a recent LLVM, the change of header causes those 3 lines to get
compiled differently:

if (!ctx->sk)
return 1;
p = bpf_sk_storage_get(&socket_cookies, ctx->sk, 0, 0);

When including linux/bpf.h
; if (!ctx->sk)
5: 79 62 b8 00 00 00 00 00 r2 = *(u64 *)(r6 + 184)
6: 15 02 10 00 00 00 00 00 if r2 == 0 goto +16 <LBB1_6>
; p = bpf_sk_storage_get(&socket_cookies, ctx->sk, 0, 0);
7: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
9: b7 03 00 00 00 00 00 00 r3 = 0
10: b7 04 00 00 00 00 00 00 r4 = 0
11: 85 00 00 00 6b 00 00 00 call 107
12: bf 07 00 00 00 00 00 00 r7 = r0

When including vmlinux.h
; if (!ctx->sk)
5: 79 61 b8 00 00 00 00 00 r1 = *(u64 *)(r6 + 184)
6: 15 01 11 00 00 00 00 00 if r1 == 0 goto +17 <LBB1_6>
; p = bpf_sk_storage_get(&socket_cookies, ctx->sk, 0, 0);
7: 79 62 b8 00 00 00 00 00 r2 = *(u64 *)(r6 + 184)
8: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
10: b7 03 00 00 00 00 00 00 r3 = 0
11: b7 04 00 00 00 00 00 00 r4 = 0
12: 85 00 00 00 6b 00 00 00 call 107
13: bf 07 00 00 00 00 00 00 r7 = r0

Note that ctx->sk gets fetched once in the first case (l5), and twice
in the second case (l5 and l7).
I'm assuming that struct bpf_sock_ops gets defined with different
attributes in vmlinux.h and causes LLVM to assume that ctx->sk could
have changed between the time of check and the time of use so it
yields two fetches and the verifier can't track that r2 is non null.

If I save ctx->sk in a temporary variable first:

struct bpf_sock *sk = ctx->sk;
if (!sk)
return 1;
p = bpf_sk_storage_get(&socket_cookies, sk, 0, 0);

this loads correctly. However, Brendan pointed out that this is also a
weak guarantee and that LLVM could still decide to compile this into
two fetches, Brendan suggested that we save sk out of an access to a
volatile pointer to ctx, what do you think ?

Your above workaround should be good. Compiler should not do *bad* optimizations to have two fetches if the source code just has one
in the above case. If this happens, it will be a llvm bug.

The latest llvm trunk can reproduce the above issue. It is due to
(1). llvm's partial (not complete) CSE and (2). this partial CSE
resulted in llvm BPF backend generating two CORE_MEM operations (for CORE relocations) instead of one. If llvm will do complete cse like the above your code, we will be fine.

Although fixing llvm CSE is desired, it may take
some time. At the same time, please use your above source workaround.
Thanks.