Re: [PATCH bpf-next v2] selftests/bpf: Run test_perf_branches more than 1 time

From: bot+bpf-ci

Date: Fri Jun 05 2026 - 10:06:03 EST

> diff --git a/tools/testing/selftests/bpf/prog_tests/perf_branches.c b/tools/testing/selftests/bpf/prog_tests/perf_branches.c
> index 0a7ef770c487..153afcdc082e 100644
> --- a/tools/testing/selftests/bpf/prog_tests/perf_branches.c
> +++ b/tools/testing/selftests/bpf/prog_tests/perf_branches.c

[ ... ]

> @@ -85,19 +76,35 @@ static void test_perf_branches_common(int perf_fd,
> + /* Make up a bunch of branch events until we time out or bpf prog sets a flag
> + * that it caught a valid event. To be completely sure that required_size_out,
> + * written_stack_out or written_global_out writes and reads won't get
> + * reordered by anyone in a way that would fail this test, use atomic
> + * operations
> + */

This isn't a bug, but a prior automated review on v2 noted these new
multi-line comments keep text on the same line as the opening /*, while
the BPF subsystem style usually wants the opening /* on its own line.

The same applies to the in-loop comment and the cpu0 pinning comment in
test_perf_branches().

> + for (i = 0; i < 100000000; ++i) {
> + /* This technically should be an atomic load with acquire
> + * ordering, but we don't want to clog up memory bus while
> + * spinning
> + */
> + if (READ_ONCE(skel->bss->valid) > 0)
> + break;
> ++j;
> + }

[ ... ]

> diff --git a/tools/testing/selftests/bpf/progs/test_perf_branches.c b/tools/testing/selftests/bpf/progs/test_perf_branches.c
> index 05ac9410cd68..9806414cebc4 100644
> --- a/tools/testing/selftests/bpf/progs/test_perf_branches.c
> +++ b/tools/testing/selftests/bpf/progs/test_perf_branches.c

[ ... ]

> @@ -40,14 +37,14 @@ int perf_branches(void *ctx)
> required_size_out = required_size;
> written_stack_out = written_stack;
> written_global_out = written_global;
> - valid = 1;
> + __atomic_add_fetch(&valid, 1, __ATOMIC_RELEASE);

Does this still emit the intended release barrier in the BPF program?

A prior automated review on v2 raised this and it does not look like it was
answered. Because the return value of __atomic_add_fetch() is discarded,
the Clang BPF backend can lower this to a relaxed BPF atomic add (without
BPF_FETCH), which on weakly-ordered architectures would carry no release
ordering.

If that happens, could the stores to required_size_out, written_stack_out
and written_global_out be observed by the waiter after the store to valid,
so that the check_good_sample()/check_bad_sample() readers see valid > 0
but read stale outputs?

In this specific test the waiter pins itself to cpu0 and the perf event
follows the task, so the BPF producer and the userspace consumer run on the
same CPU and the reordering cannot actually be observed here. Is relying on
that co-location intended, or would an explicit fetching/relaxed form make
the ordering contract clearer?

---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/27017486169