Re: [PATCH v12 0/3] sched: Restructure task_mm_cid_work for predictability

From: Mathieu Desnoyers
Date: Wed Mar 26 2025 - 10:37:23 EST


On 2025-03-26 03:31, Gabriele Monaco wrote:
On Tue, 2025-03-11 at 07:28 +0100, Gabriele Monaco wrote:
This patchset moves the task_mm_cid_work to a preemptible and
migratable
context. This reduces the impact of this work to the scheduling
latency
of real time tasks.
The change makes the recurrence of the task a bit more predictable.


The series was review and, in my opinion, is ready for inclusion.
Peter, Ingo, can we merge it?

I agree. I've reviewed the entire series a few weeks ago and it
looks good to me.

Thanks,

Mathieu


Thanks,
Gabriele

The behaviour causing latency was introduced in commit 223baf9d17f2
("sched: Fix performance regression introduced by mm_cid") which
introduced a task work tied to the scheduler tick.
That approach presents two possible issues:
* the task work runs before returning to user and causes, in fact, a
  scheduling latency (with order of magnitude significant in
PREEMPT_RT)
* periodic tasks with short runtime are less likely to run during the
  tick, hence they might not run the task work at all

Patch 1 add support for prev_sum_exec_runtime to the RT, deadline and
sched_ext classes as it is supported by fair, this is required to
avoid
calling rseq_preempt on tick if the runtime is below a threshold.

Patch 2 contains the main changes, removing the task_work on the
scheduler tick and using a work_struct scheduled more reliably during
__rseq_handle_notify_resume.

Patch 3 adds a selftest to validate the functionality of the
task_mm_cid_work (i.e. to compact the mm_cids).

Changes since V11:
* Remove variable to make mm_cid_needs_scan more compact
* All patches reviewed

Changes since V10:
* Fix compilation errors with RSEQ and/or MM_CID disabled

Changes since V9:
* Simplify and move checks from task_queue_mm_cid to its call site

Changes since V8 [1]:
* Add support for prev_sum_exec_runtime to RT, deadline and sched_ext
* Avoid rseq_preempt on ticks unless executing for more than 100ms
* Queue the work on the unbound workqueue

Changes since V7:
* Schedule mm_cid compaction and update at every tick too
* mmgrab before scheduling the work

Changes since V6 [2]:
* Switch to a simple work_struct instead of a delayed work
* Schedule the work_struct in __rseq_handle_notify_resume
* Asynchronously disable the work but make sure mm is there while we
run
* Remove first patch as merged independently
* Fix commit tag for test

Changes since V5:
* Punctuation

Changes since V4 [3]:
* Fixes on the selftest
    * Polished memory allocation and cleanup
    * Handle the test failure in main

Changes since V3 [4]:
* Fixes on the selftest
    * Minor style issues in comments and indentation
    * Use of perror where possible
    * Add a barrier to align threads execution
    * Improve test failure and error handling

Changes since V2 [5]:
* Change the order of the patches
* Merge patches changing the main delayed_work logic
* Improved self-test to spawn 1 less thread and use the main one
instead

Changes since V1 [6]:
* Re-arm the delayed_work at each invocation
* Cancel the work synchronously at mmdrop
* Remove next scan fields and completely rely on the delayed_work
* Shrink mm_cid allocation with nr thread/affinity (Mathieu
Desnoyers)
* Add self test

[1] -
https://lore.kernel.org/lkml/20250220102639.141314-1-gmonaco@xxxxxxxxxx
[2] -
https://lore.kernel.org/lkml/20250210153253.460471-1-gmonaco@xxxxxxxxxx
[3] -
https://lore.kernel.org/lkml/20250113074231.61638-4-gmonaco@xxxxxxxxxx
[4] -
https://lore.kernel.org/lkml/20241216130909.240042-1-gmonaco@xxxxxxxxxx
[5] -
https://lore.kernel.org/lkml/20241213095407.271357-1-gmonaco@xxxxxxxxxx
[6] -
https://lore.kernel.org/lkml/20241205083110.180134-2-gmonaco@xxxxxxxxxx

To: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
To: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
To: Ingo Molnar <mingo@xxxxxxxxxx>
To: Paul E. McKenney <paulmck@xxxxxxxxxx>
To: Shuah Khan <shuah@xxxxxxxxxx>

Gabriele Monaco (3):
  sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes
  sched: Move task_mm_cid_work to mm work_struct
  selftests/rseq: Add test for mm_cid compaction

 include/linux/mm_types.h                      |  17 ++
 include/linux/rseq.h                          |  13 ++
 include/linux/sched.h                         |   7 +-
 kernel/rseq.c                                 |   2 +
 kernel/sched/core.c                           |  43 ++--
 kernel/sched/deadline.c                       |   1 +
 kernel/sched/ext.c                            |   1 +
 kernel/sched/rt.c                             |   1 +
 kernel/sched/sched.h                          |   2 -
 tools/testing/selftests/rseq/.gitignore       |   1 +
 tools/testing/selftests/rseq/Makefile         |   2 +-
 .../selftests/rseq/mm_cid_compaction_test.c   | 200
++++++++++++++++++
 12 files changed, 258 insertions(+), 32 deletions(-)
 create mode 100644
tools/testing/selftests/rseq/mm_cid_compaction_test.c


base-commit: 80e54e84911a923c40d7bee33a34c1b4be148d7a



--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com