Re: [RFC PATCH 1/2] tracing/osnoise: Sample IPI counts

From: Valentin Schneider

Date: Thu Jun 11 2026 - 06:31:38 EST

On 11/06/26 10:59, Tomas Glozar wrote:
> [just replying to comments, will do a full review later]
>
> st 10. 6. 2026 v 21:51 odesílatel Crystal Wood <crwood@xxxxxxxxxx> napsal:
>>
>> On Wed, 2026-06-10 at 15:04 +0200, Valentin Schneider wrote:
>> > Osnoise already implictly accounts IPIs via its IRQ tracking,
>>
>> Does it? It seems that IPIs bypass the kernel/irq subsystem on some
>> arches (including x86, but not ARM).
>>
>> It would be nice to solve this properly by adding generic ipi
>> entry/exit tracing (similar to what ARM already has).
>>
>
> Isn't that precisely what the ipi tracepoints used by this
> implementation (ipi:ipi_send_cpu) are for?
>

Well, these catch the emission of the IPI, which is great for investigation
- slap a stacktrace trigger and you (most of the time) get the source of
your interference.

However Crystal's point is that on x86 (and I assume other archs) receiving
& handling these IPIs is "special" and doesn't go through the generic irq
subsystem and thus has to be tracked separately, which is why osnoise has
this fairly lengthy osnoise_arch_register() thing.

>> > however it
>> > can be interesting to distiguish between the two: undesired IPIs usually
>> > imply a software configuration issue (e.g. wrong/incomplete CPU isolation)
>> > whereas undesired (non-IPI) IRQs usually imply a hardware configuration
>> > issue.
>> >
>> > Signed-off-by: Valentin Schneider <vschneid@xxxxxxxxxx>
>> > ---
>> > Note that this is modifying the osnoise:osnoise_entry Ftrace entry; I know
>> > trace events are sort of supposed to be stable, but I'm not sure about
>> > ftrace entries.
>>
>> I think old rtla will be OK with this since it looks up fields by name
>> rather than assuming a fixed layout.
>>
>
> Yeah, the fields are either looked up with tep_get_field_val() [2], or
> with name-based BPF CO-RE relocations against the tracepoint structure
> [3]. So this shouldn't be an issue, as long as the old counts stay the
> same.
>
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/tools/tracing/rtla/src/timerlat_hist.c#n191
> [3] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/tools/tracing/rtla/src/timerlat.bpf.c#n12
>
>> > Alternatively I can have this be purely supported in userspace osnoise by
>> > hooking into the IPI events and counting IPIs separately from the osnoise
>> > events.
>>
>> One benefit I could see of doing this in kernel osnoise would be if you
>> could atomically correlate the count with the particular noise
>> interval, but this patch doesn't do that.
>>
>
> The count is already reported by cycle on the kernel side in the
> patchset, right? It's only missing in the current RTLA (userspace)
> part, as there is no statistic using the information. But it can still
> be collected through custom histogram triggers.
>
>> > ...
>> >
>> > +static void trace_ipi_send_cpumask_callback(void *data, const struct cpumask *cpumask,
>> > + unsigned long callsite, void *callback)
>> > +{
>> > + struct osnoise_variables *osn_var;
>> > + int cpu;
>> > +
>> > + for_each_cpu_and(cpu, cpumask, &osnoise_cpumask) {
>> > + osn_var = per_cpu_ptr(&per_cpu_osnoise_var, cpu);
>> > + ipi_emission(osn_var, cpu);
>> > + }
>> > +}
>>
>> Isn't this racy to do from a different CPU? Both in terms of the
>> counter, and the timing of the increment relative to when the IPI is
>> actually received. Not necessarily a huge deal if you only care about
>> zero versus bignum, but still. At least worth a comment, if we go with
>> this approach.
>>
>
> I also think it's a bit confusing, especially as the other accesses to
> osn_var are cpu-local, but here, "cpu" is the *target* CPU, not the
> current CPU. Not sure how expensive it would be to do atomic_add for
> that, at least it's something to consider.
>

I suppose that could be an argument for doing that stat aggregation in
userspace osnoise - event handlers are run after the fact via
tracefs_iterate_raw_events(), it's all inherently slower since it's just
increments of one (one per handled event) but it's also all done in
userspace on a control thread and doesn't bog down the kernelspace.

> Tomas