Re: [REGRESSION] osnoise: "eventpoll: Replace rwlock with spinlock" causes ~50us noise spikes on isolated PREEMPT_RT cores

From: Tomas Glozar

Date: Thu Apr 02 2026 - 06:01:51 EST


st 1. 4. 2026 v 19:08 odesílatel Ionut Nechita (Wind River)
<ionut.nechita@xxxxxxxxxxxxx> napsal:
>
> Separate question: could eosnoise itself be improved to avoid this
> contention? For example, using one epoll instance per CPU instead of
> a single shared one, or using BPF ring buffer (BPF_MAP_TYPE_RINGBUF)
> instead of the per-cpu perf buffer which requires epoll.

Neither BPF ring buffers nor perf event buffers strictly require you
to use epoll. Just as a BPF ring buffer can be read using libbpf's
ring_buffer__consume() [1] without polling, perf_buffer__consume() [2]
can be used the same way for the perf event ringbuffer; neither of the
functions block. If you need to poll, BPF ring buffer also uses
epoll_wait() [3] so that won't make a difference (or is there another
way to poll it?)

[1] https://docs.ebpf.io/ebpf-library/libbpf/userspace/ring_buffer__consume/
[2] https://docs.ebpf.io/ebpf-library/libbpf/userspace/perf_buffer__consume/
[3] https://github.com/libbpf/libbpf/blob/master/src/ringbuf.c#L341

That being said, BPF ring buffer is not per-CPU and should allow
collecting data from all CPUs into one buffer.

> If the consensus is that the kernel side is working as intended and the tool
> should adapt, I'd like to understand what the recommended pattern is
> for BPF observability tools on PREEMPT_RT.

The ideal solution is to aggregate data in BPF directly, not in
userspace, and collect them at the end of the measurement, when
possible. This is what rtla-timerlat does for collecting samples [4]
where it was implemented to prevent the collecting user space thread
from being overloaded with too many samples on systems with a large
number of CPU; polling on ring buffer is used to signal end of tracing
on latency threshold only, no issues have been reported with that. To
collect data about system noise, timerlat collects the events in an
ftrace ring buffer, and then analyzes the tail of the buffer (i.e.
what is relevant to the spike, not all data throughout the entire
measurement) in user space [5]. The same could be replicated in
eosnoise, i.e. collecting the data into a ringbuffer and only reading
the tail in userspace, if that suffices for your use case.

[4] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/tools/tracing/rtla/src/timerlat.bpf.c
[5] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/tracing/rtla/src/timerlat_aa.c


Tomas