Re: [RFC PATCH 0/4] Scheduler time slice extension
From: Mathieu Desnoyers
Date: Fri Nov 15 2024 - 09:31:55 EST
On 2024-11-14 14:42, Prakash Sangappa wrote:
On Nov 14, 2024, at 2:28 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
[...]
See:
https://lkml.kernel.org/r/20220113233940.3608440-4-posk@xxxxxxxxxx
for a more elaborate scheme.
Peter, was there anything fundamentally wrong with your approach based
on rseq ? https://lore.kernel.org/lkml/20231030132949.GA38123@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
The main thing I wonder is whether loading the rseq delay resched flag
on return to userspace is too late in your patch. Also, I'm not sure it is
realistic to require that no system calls should be done within time extension
slice. If we have this scenario:
I am also not sure if we need to prevent system calls in this scenario.
Was that restriction mainly because of restartable sequence API implements it?
No, the whole premise of delaying resched was because people think that
syscalls are too slow. If you do not think this, then you shouldn't be
using this.
Agree.
I only partially agree with Peter here. I agree that we don't want to
add system calls on the delay-resched critical section fast path,
because this would have a significant performance hit.
But there are scenarios where issuing system calls from within that
critical section would be needed, even though those would not belong
to the fast path:
1) If a signal handler nests over a delay-resched critical section.
That signal handler is allowed to issue system calls.
2) If the critical section fast-path is calling GNU C library API and/or
a vDSO, which is typically fast, but can end up calling a system call
as fallback. e.g. clock_gettime, sched_getcpu. Preventing use of a
system call by killing the application punches a hole in the
abstractions meant to be provided by GNU libc and vDSO.
I would recommend that we allow issuing system calls while the
delay-resched bit is set. However, we may not strictly need to honor
the delay-resched hint from a system call context, as those would
be expected to be either infrequent or a portability fallback,
which means the enhanced performance provided by delay-resched
really won't matter.
Another scenario to keep in mind are page faults happening within a
delay-resched critical section. This is a scenario where page fault
handling can explicitly reschedule. If this happens, I suspect we
really don't care about the delay-resched hint, but we should consider
whether this hint should be left as-is or cleared.
Thoughts ?
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com