Re: [PATCH v2 21/22] rv: Add documentation for rtapp monitor
From: Gabriele Monaco
Date: Tue Apr 15 2025 - 09:13:19 EST
On Fri, 2025-04-11 at 09:37 +0200, Nam Cao wrote:
> Add documentation describing the rtapp monitor.
>
> Signed-off-by: Nam Cao <namcao@xxxxxxxxxxxxx>
> ---
> Documentation/trace/rv/monitor_rtapp.rst | 105
> +++++++++++++++++++++++
> 1 file changed, 105 insertions(+)
> create mode 100644 Documentation/trace/rv/monitor_rtapp.rst
>
> diff --git a/Documentation/trace/rv/monitor_rtapp.rst
> b/Documentation/trace/rv/monitor_rtapp.rst
> new file mode 100644
> index 000000000000..1cd188039a7e
> --- /dev/null
> +++ b/Documentation/trace/rv/monitor_rtapp.rst
> @@ -0,0 +1,105 @@
> +Scheduler monitors
> +==================
> +
> +- Name: rtapp
> +- Type: container for multiple monitors
> +- Author: Nam Cao <namcao@xxxxxxxxxxxxx>
> +
> +Description
> +-----------
> +
> +Real-time applications may have design flaws such that they
> experience unexpected latency and fail
> +to meet their time requirements. Often, these flaws follow a few
> patterns:
> +
> + - Page faults: A real-time thread may access memory that does not
> have a mapped physical backing
> + or must first be copied (such as for copy-on-write). Thus a page
> fault is raised and the kernel
> + must first perform the expensive action. This causes significant
> delays to the real-time thread
> + - Priority inversion: A real-time thread blocks waiting for a
> lower-priority thread. This causes
> + the real-time thread to effectively take on the scheduling
> priority of the lower-priority
> + thread. For example, the real-time thread needs to access a
> shared resource that is protected by
> + a non-pi-mutex, but the mutex is currently owned by a non-real-
> time thread.
> +
> +The `rtapp` monitor detects these patterns. It aids developers to
> identify reasons for unexpected
> +latency with real-time applications. It is a container of multiple
> sub-monitors described in the
> +following sections.
> +
> +Monitor pagefault
> ++++++++++++++++++
> +
> +The `pagefault` monitor reports real-time tasks raising page faults.
> Its specification is::
> +
> + RULE = always (RT imply not PAGEFAULT)
> +
> +To fix warnings reported by this monitor, `mlockall()` or `mlock()`
> can be used to ensure physical
> +backing for memory.
> +
> +This monitor may have false negatives because the pages used by the
> real-time threads may just
> +happen to be directly available during testing. To minimize this,
> the system can be put under memory
> +pressure (e.g. invoking the OOM killer using a program that does
> `ptr = malloc(SIZE_OF_RAM);
> +memset(ptr, 0, SIZE_OF_RAM);`) so that the kernel executes
> aggressive strategies to recycle as much
> +physical memory as possible.
> +
> +Monitor sleep
> ++++++++++++++
> +
> +The `sleep` monitor reports real-time threads sleeping in a manner
> that may cause undesirable
> +latency. Real-time applications should only put a real-time thread
> to sleep for one of the following
> +reasons:
> +
> + - Cyclic work: real-time thread sleeps waiting for the next cycle.
> For this case, only the
> + `nanosleep` syscall should be used. No other method is safe for
> real-time. For example, threads
> + waiting for timerfd can be woken by softirq which provides no
> real-time guarantee.
> + - Real-time thread waiting for something to happen (e.g. another
> thread releasing shared
> + resources, or a completion signal from another thread). In this
> case, only futexes with priority
> + inheritance (PI) should be used. Applications usually do not use
> futexes directly, but use PI
> + mutexes and PI condition variables which are built on top of
> futexes. Be aware that the C
> + library might not implement conditional variables as safe for
> real-time. As an alternative, the
> + librtpi library exists to provide a conditional variable
> implementation that is correct for
> + real-time applications in Linux.
> +
> +Beside the reason for sleeping, the eventual waker should also be
> real-time-safe. Namely, one of:
> +
> + - An equal-or-higher-priority thread
> + - Hard interrupt handler
> + - Non-maskable interrupt handler
> +
> +This monitor's warning usually means one of the following:
> +
> + - Real-time thread is blocked by a non-real-time thread (e.g. due
> to contention on a mutex without
> + priority inheritance). This is priority inversion.
> + - Time-critical work waits for something which is not safe for
> real-time (e.g. timerfd).
> + - The work executed by the real-time thread does not need to run
> at real-time priority at all.
> + This is not a problem for the real-time thread itself, but it is
> potentially taking the CPU away
> + from other important real-time work.
> +
> +Application developers may purposely choose to have their real-time
> application sleep in a way that
> +is not safe for real-time. It is debatable whether that is a
> problem. Application developers must
> +analyze the warnings to make a proper assessment.
> +
> +The monitor's specification is::
> +
> + RULE = always (RT imply (SLEEP imply (RT_FRIENDLY_SLEEP or
> ALLOWLIST)))
> +
> + RT_FRIENDLY_SLEEP = (RT_VALID_SLEEP_REASON or KERNEL_THREAD)
> + and ((not WAKE) until RT_FRIENDLY_WAKE)
> +
> + RT_VALID_SLEEP_REASON = PI_FUTEX or NANOSLEEP
> +
> + RT_FRIENDLY_WAKE = WOKEN_BY_EQUAL_OR_HIGHER_PRIO
> + or WOKEN_BY_HARDIRQ
> + or WOKEN_BY_NMI
> +
> + ALLOWLIST = BLOCK_ON_RT_MUTEX
> + or TASK_IS_RCU
> + or TASK_IS_MIGRATION
> + or KTHREAD_SHOULD_STOP
> +
> +Beside the scenarios described above, this specification also handle
> some special cases:
> +
> + - `KERNEL_THREAD`: kernel tasks do not have any pattern that can
> be recognized as valid real-time
> + sleeping reasons. Therefore sleeping reason is not checked for
> kernel tasks.
> + - `RT_SLEEP_WHITELIST`: to handle known false positives with
> kernel tasks.
Is this what you call ALLOWLIST?
Just out of curiosity, normal kernel threads are not forced to follow a
VALID_SLEEP_REASON but need RT_FRIENDLY_WAKE, how are tasks like RCU
and migration not following this?
The monitors are not designed for deadline tasks, any plan to extend to
those too?
Other than this, nice explanation and monitors, thanks.
Reviewed-by: Gabriele Monaco <gmonaco@xxxxxxxxxx>
> + - `BLOCK_ON_RT_MUTEX` is included in the allowlist due to its
> implementation. In the release path
> + of rt_mutex, a boosted task is de-boosted before waking the
> rt_mutex's waiter. Consequently, the
> + monitor may see a real-time-unsafe wakeup (e.g. non-real-time
> task waking real-time task). This
> + is actually real-time-safe because preemption is disable for the
> duration.
Typo:
+ is actually real-time-safe because preemption is disable**d** for
the duration.