[bpf?] [net-next ?] [RESEND] possible bpf overflow/output bug introduced in 6.10rc1 ?
From: Joe Damato
Date: Fri Jul 12 2024 - 12:54:09 EST
Greetings:
(I am reposting this question after 2 days and to a wider audience
as I didn't hear back [1]; my apologies it just seemed like a
possible bug slipped into 6.10-rc1 and I wanted to bring attention
to it before 6.10 is released.)
While testing some unrelated networking code with Martin Karsten (cc'd on
this email) we discovered what appears to be some sort of overflow bug in
bpf.
git bisect suggests that commit f11f10bfa1ca ("perf/bpf: Call BPF handler
directly, not through overflow machinery") is the first commit where the
(I assume) buggy behavior appears.
Running the following on my machine as of the commit mentioned above:
bpftrace -e 'tracepoint:napi:napi_poll { @[args->work] = count(); }'
while simultaneously transferring data to the target machine (in my case, I
scp'd a 100MiB file of zeros in a loop) results in very strange output
(snipped):
@[11]: 5
@[18]: 5
@[-30590]: 6
@[10]: 7
@[14]: 9
It does not seem that the driver I am using on my test system (mlx5) would
ever return a negative value from its napi poll function and likewise for
the driver Martin is using (mlx4).
As such, I don't think it is possible for args->work to ever be a large
negative number, but perhaps I am misunderstanding something?
I would like to note that commit 14e40a9578b7 ("perf/bpf: Remove #ifdef
CONFIG_BPF_SYSCALL from struct perf_event members") does not exhibit this
behavior and the output seems reasonable on my test system. Martin confirms
the same for both commits on his test system, which uses different hardware
than mine.
Is this an expected side effect of this change? I would expect it is not
and that the output is a bug of some sort. My apologies in that I am not
particularly familiar with the bpf code and cannot suggest what the root
cause might be.
If it is not a bug:
1. Sorry for the noise :(
2. Can anyone suggest what this output might mean or how the
script run above should be modified? AFAIK this is a fairly
common bpftrace that many folks run for profiling/debugging
purposes.
Thanks,
Joe
[1]: https://lore.kernel.org/bpf/Zo64cpho2cFQiOeE@LQ3V64L9R2/T/#u