Re: [RFC PATCH 1/4] rseq: Add sched_state field to struct rseq

From: Mathieu Desnoyers
Date: Tue May 23 2023 - 08:49:31 EST


On 2023-05-19 16:51, Noah Goldstein wrote:
On Wed, May 17, 2023 at 10:28 AM Mathieu Desnoyers via Libc-alpha
<libc-alpha@xxxxxxxxxxxxxx> wrote:

Expose the "on-cpu" state for each thread through struct rseq to allow
adaptative mutexes to decide more accurately between busy-waiting and
calling sys_futex() to release the CPU, based on the on-cpu state of the
mutex owner.

It is only provided as an optimization hint, because there is no
guarantee that the page containing this field is in the page cache, and
therefore the scheduler may very well fail to clear the on-cpu state on
preemption. This is expected to be rare though, and is resolved as soon
as the task returns to user-space.

The goal is to improve use-cases where the duration of the critical
sections for a given lock follows a multi-modal distribution, preventing
statistical guesses from doing a good job at choosing between busy-wait
and futex wait behavior.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
Cc: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Cc: Jonathan Corbet <corbet@xxxxxxx>
Cc: Steven Rostedt (Google) <rostedt@xxxxxxxxxxx>
Cc: Carlos O'Donell <carlos@xxxxxxxxxx>
Cc: Florian Weimer <fweimer@xxxxxxxxxx>
Cc: libc-alpha@xxxxxxxxxxxxxx
---
include/linux/sched.h | 12 ++++++++++++
include/uapi/linux/rseq.h | 17 +++++++++++++++++
kernel/rseq.c | 14 ++++++++++++++
3 files changed, 43 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index eed5d65b8d1f..c7e9248134c1 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2351,11 +2351,20 @@ static inline void rseq_signal_deliver(struct ksignal *ksig,
rseq_handle_notify_resume(ksig, regs);
}

+void __rseq_set_sched_state(struct task_struct *t, unsigned int state);
+
+static inline void rseq_set_sched_state(struct task_struct *t, unsigned int state)
+{
+ if (t->rseq)
+ __rseq_set_sched_state(t, state);
+}
+
/* rseq_preempt() requires preemption to be disabled. */
static inline void rseq_preempt(struct task_struct *t)
{
__set_bit(RSEQ_EVENT_PREEMPT_BIT, &t->rseq_event_mask);
rseq_set_notify_resume(t);
+ rseq_set_sched_state(t, 0);

Should rseq_migrate also be made to update the cpu_id of the new core?
I imagine the usage of this will be something along the lines of:

if(!on_cpu(mutex->owner_rseq_struct) &&
cpu(mutex->owner_rseq_struct) == this_threads_cpu)
// goto futex

So I would think updating on migrate would be useful as well.

I don't think we want to act differently based on the cpu on which the owner is queued.

If the mutex owner is not on-cpu, and queued on the same cpu as the current thread, we indeed want to call sys_futex WAIT.

If the mutex owner is not on-cpu, but queued on a different cpu than the current thread, we *still* want to call sys_futex WAIT, because busy-waiting for a thread which is queued but not currently running is wasteful.

Or am I missing something ?

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com