Re: [PATCH V3 9/9] tracing: Add timerlat tracer

From: Daniel Bristot de Oliveira
Date: Fri Jun 11 2021 - 08:59:23 EST

Next message: Chris Mason: "Re: [PATCH] btrfs: Disable BTRFS on platforms having 256K pages"
Previous message: Jonathan Cameron: "Re: [PATCH 5/5] cxl/pmem: Register 'pmem' / cxl_nvdimm devices"
Next in thread: Steven Rostedt: "Re: [PATCH V3 9/9] tracing: Add timerlat tracer"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 6/8/21 3:36 AM, Steven Rostedt wrote:
> On Fri, 14 May 2021 22:51:18 +0200
> Daniel Bristot de Oliveira <bristot@xxxxxxxxxx> wrote:

[...]
>> It is possible to follow the trace by reading the trace trace file::
>
> Do not need rst markup in commit logs ;-)
>
>>
>> [root@f32 tracing]# cat trace
>> # tracer: timerlat
>> #
>> # _-----=> irqs-off
>> # / _----=> need-resched
>> # | / _---=> hardirq/softirq
>> # || / _--=> preempt-depth
>> # || /
>> # |||| ACTIVATION
>> # TASK-PID CPU# |||| TIMESTAMP ID CONTEXT LATENCY
>> # | | | |||| | | | |
>> <idle>-0 [000] d.h1 54.029328: #1 context irq timer_latency 932 ns
>> <...>-867 [000] .... 54.029339: #1 context thread timer_latency 11700 ns
>> <idle>-0 [001] dNh1 54.029346: #1 context irq timer_latency 2833 ns
>> <...>-868 [001] .... 54.029353: #1 context thread timer_latency 9820 ns
>> <idle>-0 [000] d.h1 54.030328: #2 context irq timer_latency 769 ns
>> <...>-867 [000] .... 54.030330: #2 context thread timer_latency 3070 ns
>> <idle>-0 [001] d.h1 54.030344: #2 context irq timer_latency 935 ns
>> <...>-868 [001] .... 54.030347: #2 context thread timer_latency 4351 ns
>>
>> The tracer creates a per-cpu kernel thread with real-time priority that
>> prints two lines at every activation. The first is the *timer latency*
>> observed at the *hardirq* context before the activation of the thread.
>> The second is the *timer latency* observed by the thread, which is the
>> same level that cyclictest reports. The ACTIVATION ID field
>
> The above is misleading. Below, I see that you state that the values are
> "net values" where the thread latency does not include the irq latency.
> This is not the same as cyclictest. (I had to update my ASCII art below
> after reading the below statement).

Replying here all the comments about the timerlat/cyclictest
timeline, and gross/net values.

So, yeah, my description was not clear enough. The values that are
net are those reported by the *osnoise: events* only. The *timerlat
tracer* values are not discounted, and that is why they are similar
to the value reported by cyclictest (ok, cyclictest still captures
the exit to user overhead and friends).

>> serves to relate the *irq* execution to its respective *thread* execution.
>>
>> The irq/thread splitting is important to clarify at which context
>> the unexpected high value is coming from. The *irq* context can be
>> delayed by hardware related actions, such as SMIs, NMIs, IRQs
>> or by a thread masking interrupts. Once the timer happens, the delay
>> can also be influenced by blocking caused by threads. For example, by
>> postponing the scheduler execution via preempt_disable(), by the
>> scheduler execution, or by masking interrupts. Threads can
>> also be delayed by the interference from other threads and IRQs.
>
>
> I wonder if ASCII art would help clarify the above. At least for the
> document (not the change log here).
>
>
> time ==>
> expected actual thread
> wakeup wakeup scheduled
> | | |
> v v v
> |---------|-------|------|----------|
> ^
> |
> interrupt
>
> |--------------|
> irq latency
>
> |-----------|
> thread latency
>

A liked the idea of adding a timeline!

[...]

>> + [root@f32 tracing]# cat trace
>> + # tracer: timerlat
>> + #
>> + # _-----=> irqs-off
>> + # / _----=> need-resched
>> + # | / _---=> hardirq/softirq
>> + # || / _--=> preempt-depth
>> + # || /
>> + # |||| ACTIVATION
>> + # TASK-PID CPU# |||| TIMESTAMP ID CONTEXT LATENCY
>> + # | | | |||| | | | |
>> + <idle>-0 [000] d.h1 54.029328: #1 context irq timer_latency 932 ns
>> + <...>-867 [000] .... 54.029339: #1 context thread timer_latency 11700 ns
>> + <idle>-0 [001] dNh1 54.029346: #1 context irq timer_latency 2833 ns
>> + <...>-868 [001] .... 54.029353: #1 context thread timer_latency 9820 ns
>> + <idle>-0 [000] d.h1 54.030328: #2 context irq timer_latency 769 ns
>> + <...>-867 [000] .... 54.030330: #2 context thread timer_latency 3070 ns
>> + <idle>-0 [001] d.h1 54.030344: #2 context irq timer_latency 935 ns
>> + <...>-868 [001] .... 54.030347: #2 context thread timer_latency 4351 ns
>> +
>> +
>> +The tracer creates a per-cpu kernel thread with real-time priority that
>> +prints two lines at every activation. The first is the *timer latency*
>> +observed at the *hardirq* context before the activation of the thread.
>> +The second is the *timer latency* observed by the thread, which is the
>> +same level that cyclictest reports. The ACTIVATION ID field
>> +serves to relate the *irq* execution to its respective *thread* execution.
>> +
>> +The *irq*/*thread* splitting is important to clarify at which context
>> +the unexpected high value is coming from. The *irq* context can be
>> +delayed by hardware related actions, such as SMIs, NMIs, IRQs
>> +or by a thread masking interrupts. Once the timer happens, the delay
>> +can also be influenced by blocking caused by threads. For example, by
>> +postponing the scheduler execution via preempt_disable(), by the
>> +scheduler execution, or by masking interrupts. Threads can
>> +also be delayed by the interference from other threads and IRQs.
>
> This is where I would add that ASCII art.

I am proposing a ASCII art on another point.... see bellow.

>
>> +
>> +Tracer options
>> +---------------------
>> +
>> +The timerlat tracer is built on top of osnoise tracer.
>> +So its configuration is also done in the osnoise/ config
>> +directory. The timerlat configs are:
>> +
>> + - cpus: CPUs at which a timerlat thread will execute.
>> + - timerlat_period_us: the period of the timerlat thread.
>> + - osnoise/stop_tracing_in_us: stop the system tracing if a
>> + timer latency at the *irq* context higher than the configured
>> + value happens. Writing 0 disables this option.
>> + - stop_tracing_out_us: stop the system tracing if a
>> + timer latency at the *thread* context higher than the configured
>> + value happens. Writing 0 disables this option.
>> + - print_stack: save the stack of the IRQ ocurrence, and print
>> + it after the *thread* read the latency.
>
> "thread read the latency" doesn't make sense.
>
> "and print it after the *thread context* event". ?
>
>
>> +
>> +timerlat and osnoise
>> +----------------------------
>> +
>> +The timerlat can also take advantage of the osnoise: traceevents.
>> +For example::
>> +
>> + [root@f32 ~]# cd /sys/kernel/tracing/
>> + [root@f32 tracing]# echo timerlat > current_tracer
>> + [root@f32 tracing]# echo osnoise > set_event
>
> Note, set_event should be deprecated. Use:
>
> echo 1 > events/osnoise/enable
>
> instead.
>
>
>> + [root@f32 tracing]# echo 25 > osnoise/stop_tracing_out_us
>> + [root@f32 tracing]# tail -10 trace
>> + cc1-87882 [005] d..h... 548.771078: #402268 context irq timer_latency 1585 ns
>> + cc1-87882 [005] dNLh1.. 548.771082: irq_noise: local_timer:236 start 548.771077442 duration 4597 ns
>> + cc1-87882 [005] dNLh2.. 548.771083: irq_noise: reschedule:253 start 548.771083017 duration 56 ns
>> + cc1-87882 [005] dNLh2.. 548.771086: irq_noise: call_function_single:251 start 548.771083811 duration 2048 ns
>> + cc1-87882 [005] dNLh2.. 548.771088: irq_noise: call_function_single:251 start 548.771086814 duration 1495 ns
>> + cc1-87882 [005] dNLh2.. 548.771091: irq_noise: call_function_single:251 start 548.771089194 duration 1558 ns
>> + cc1-87882 [005] dNLh2.. 548.771094: irq_noise: call_function_single:251 start 548.771091719 duration 1932 ns
>> + cc1-87882 [005] dNLh2.. 548.771096: irq_noise: call_function_single:251 start 548.771094696 duration 1050 ns
>> + cc1-87882 [005] d...3.. 548.771101: thread_noise: cc1:87882 start 548.771078243 duration 10909 ns
>> + timerlat/5-1035 [005] ....... 548.771103: #402268 context thread timer_latency 25960 ns
>> +
>> +In this case, the root cause of the timer latency does not point for a
>> +single, but to a series of call_function_single IPIs, followed by a 10
>
> "not point to a single"
>
>> +*us* delay from a cc1 thread noise, along with the regular timer
>> +activation. It is worth mentioning that the *duration* values reported
>> +by the osnoise events are *net* values. For example, the
>> +thread_noise does not include the duration of the overhead caused
>> +by the IRQ execution (which indeed accounted for 12736 ns).
>
> As stated above, I updated my view of the ASCII art after reading this. You
> should not compare what cyclictest reports as the thread latency. But what
> cyclictest reports is the sum of the two (irq latency plus thread latency).

Here is the point where I mention net values... so I am changing this part of
documentation to:

------------------- %< -----------------------------
It is worth mentioning that the *duration* values reported
by the osnoise: events are *net* values. For example, the
thread_noise does not include the duration of the overhead caused
by the IRQ execution (which indeed accounted for 12736 ns). But
the values reported by the timerlat tracer (timerlat_latency)
are *gross* values.

The art below illustrates a CPU timeline and how the timerlat tracer
observes it at the top and the osnoise: events at the bottom. Each "-"
in the timelines means 1 us, and the time moves ==>:

External context irq context thread
clock timer_latency timer_latency
event 18 us 48 us
| ^ ^
v | |
|------------------| | <-- timerlat irq timeline
|------------------+-----------------------------| <-- timerlat thread timeline
^ ^
===================== CPU timeline ======================================
[timerlat/ irq] [ dev irq ]
[another thread...^ v..^ v........][timerlat/ thread]
===================== CPU timeline ======================================
|-------------| |---------| <-- irq_noise timeline
|--^ v--------| <-- thread_noise timeline
| | |
| | + thread_noise: 10 us
| +-> irq_noise: 9 us
+-> irq_noise: 13 us

--------------- >% --------------------------------

thoughts?

-- Daniel

Next message: Chris Mason: "Re: [PATCH] btrfs: Disable BTRFS on platforms having 256K pages"
Previous message: Jonathan Cameron: "Re: [PATCH 5/5] cxl/pmem: Register 'pmem' / cxl_nvdimm devices"
Next in thread: Steven Rostedt: "Re: [PATCH V3 9/9] tracing: Add timerlat tracer"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]