Re: [PATCH RFC] sched: deferred set priority (dprio)

From: Sergey Oboguev
Date: Mon Jul 28 2014 - 00:16:16 EST

On Sun, Jul 27, 2014 at 6:19 PM, Andi Kleen <andi@xxxxxxxxxxxxxx> wrote:
>> [This is a repost of the message from few day ago, with patch file
>> inline instead of being pointed by the URL.]
> Have you checked out the preemption control that was posted some time
> ago? It did essentially the same thing, but somewhat simpler than your
> patch.

Yes, I have seen this discussion. The patch suggested by Khalid implements a
solution very much resembling Solaris/AIX schedctl. Schedctl is less generic
and powerful than dprio. I compared dprio vs. schedctl in the write-up

To quote from there,

[--- Quote ---]

The Solaris schedctl [...]
does not provide a way to associate a priority with the resource
whose lock is being held (or, more generally, with thread application-specific
logical state; see the footnote below). An application is likely to have a
range of locks with different criticality levels and different needs for
holder protection [*]. For some locks, holder preemption may be tolerated
somewhat, while other locks are highly critical, furthermore for some lock
holders preemption by a high-priority thread is acceptable but not a preemption
by a low-priority thread. The Solaris/AIX schedctl does not provide a
capability for priority ranging relative to the context of the whole
application and other processes in the system.

[*] We refer just to locks here for simplicity, but the need of a thread
for preemption control does not reduce to locks held alone, and may
result from other intra-application state conditions, such as executing
a time-urgent fragment of code in response to a high-priority event
(that may potentially be blocking for other threads) or other code
paths that can lead to wait chains unless completed promptly.

Second, in some cases application may need to perform time-urgent processing
without knowing in advance how long it will take. In the majority of cases the
processing may be very short (a fraction of a scheduling timeslice), but
occasionally may take much longer (such as a fraction of a second). Since
schedctl would not be effective in the latter case, an application would have
to resort to system calls for thread priority control in all cases [*], even
in the majority of "short processing" cases, with all the overhead of this

[*] Or introduce extra complexity, most likely very cumbersome, by trying
to gauge and monitor the accumulated duration of the processing, with
the intention to transition from schedctl to thread priority elevation
once a threshold has been reached.

[--- End of quote ---]

Even so, I felt somewhat puzzled by the response to Khalid's
delay-preempt patch.
While some arguments put forth against it were certainly valid in their own
right, but somehow their focus seemed to be that the solution won't interoperate
well with all the conceivable setups and application mixes, won't solve all the
concurrency issues, and the worst of all won't slice bread either. Whereas my
perception (perhaps incorrectly) was that this patch was not meant to solve a
whole range of problems and to be a feature enabled by default in a generic
system, but rather a specialized feature configurable in special-purpose
systems (e.g. database servers, Khalid was doing it for Oracle, and his JVM
use case I believe is also in this context) dedicated to running a
primary-importance application that utilizes this mechanism and meant to solve
a very particular problem of this specific category of system deployment cases.
It appeared to me that the participants to delay-preempt patch discussion
might have had different idea of the implied use scope of the suggested
feature, and it might have influenced the direction of the discussion.

- Sergey
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at