Re: [PATCH] tracing/osnoise: Force quiescent states while tracing
From: Daniel Bristot de Oliveira
Date: Tue Mar 01 2022 - 14:31:32 EST
On 3/1/22 19:58, Paul E. McKenney wrote:
> On Tue, Mar 01, 2022 at 07:44:38PM +0100, Daniel Bristot de Oliveira wrote:
>> On 3/1/22 19:05, Paul E. McKenney wrote:
>>>> I see, as long as it costs < 1 us, I am ok. If it gets > 1us in a reasonably
>>>> fast machine, we start see HW noise where it does not exist, and that would
>>>> reduce the resolution of osnoise. AFAICS, it is not causing that problem, but we
>>>> need to make it as lightweight as possible.
>>> In the common case, it is atomically incrementing a local per-CPU counter
>>> and doing a store. This should be quite cheap.
>>>
>>> The uncommon case is when the osnoise process was preempted or otherwise
>>> interfered with during a recent RCU read-side critical section and
>>> preemption was disabled around that critical section's outermost
>>> rcu_read_unlock(). This can be quite expensive. But I would expect
>>> you to just not do this. ;-)
>>
>> Getting the expensive call after a preemption is not a problem, it is a side
>> effect of the most costly preemption.
>>
>> It this case, we should "ping rcu" before reading the time to account the
>> overhead for the previous preemption which caused it.
>>
>> like (using the current code as example):
>>
>> ------------------------- %< -------------------------------
>> static u64
>> set_int_safe_time(struct osnoise_variables *osn_var, u64 *time)
>> {
>> u64 int_counter;
>>
>> do {
>> int_counter = local_read(&osn_var->int_counter);
>>
>> ------------> HERE <-------------------------------------
>>
>> /* synchronize with interrupts */
>> barrier();
>>
>> *time = time_get();
>>
>> /* synchronize with interrupts */
>> barrier();
>> } while (int_counter != local_read(&osn_var->int_counter));
>>
>> return int_counter;
>> }
>> ------------------------- >% -------------------------------
>>
>> In this way anything that happens before this *time is accounted before it is
>> get. If anything happens while this loop is running, it will run again, so it is
>> safe to point to the previous case.
>>
>> We would have to make a copy of this function, and only use the copy for the
>> run_osnoise() case. A good name would be something in the lines of
>> set_int_safe_time_rcu().
>>
>> (Unless the expensive is < than 1us.)
>
> The outermost rcu_read_unlock() could involve a call into the scheduler
> to do an RCU priority deboost, which I would imagine could be a bit
> expensive. But I would expect you to configure the test in such a way
> that there was no need for RCU priority boosting. For example, by making
> sure that the osnoise process's RCU readers were never preempted.
So, the noise will not be seeing in the call that Nicolas is doing. but in the
rcu_read_unlock() inside osnoise processes?
If that is the case, then the "noise" would already be accounted to the
previously preempted thread... and we should be fine then.
>
> Just out of curiosity, why is this running in the kernel rather than in
> userspace? To focus only on kernel-centric noise sources? Or are there
> people implementing real-time applications within the kernel?
It is in kernel because it allows me to sync the workload and the trace, getting
more (and more precise) information.
For example, I can read the "noise in time" and how many interrupts happened in
between two reads of the time, so I can look back in the trace to figure out
which sources of noise were the cause of the noise I am seeing - without false
positives. If no "interference" happened, I can safely say that it was a
hardware noise (this saves us time in the debug, no need to run hwlat - I run
two tools in one).
This all with a more cheap access to the data. I also use such information to
parse trace in kernel in a cheaper way, printing less info to the trace buffer.
But the idea is to see the noise for an user-space application as much as
possible (and no, I am not doing application in kernel... but I know people
doing it using a unikernel, but that is another story... a longer one... :-)).
-- Daniel
>
> Thanx, Paul