Re: OSDL Bug 3770
From: Nick Piggin
Date: Sat Dec 18 2004 - 04:44:54 EST
Loic Domaigne wrote:
Hello NPTL Mailing List!
Hello Loic! Thanks for the interesting mail.
I'm CCing lkml and Ingo with this, because I wouldn't feel comfortable
lkml: We're discussing the fact that on SMP machines, our realtime
policies are per-CPU only. This caused a problem where a high priority
one CPU caused all lower priority tasks on that CPU to be starved, while
on another CPU with the same low priority were able to run.
Ah, the problem is that when the driver thread has a higher
priority than the worker threads, so when the driver goes into an
infinite loop waiting, the able to schedule, however.
Although POSIX legally permits such implementation for realtime policy
on SMP machines, this implementation is clearly *NOT* REASONABLE.
Well I haven't done much in the realtime area... but nobody has
complained till now.
The reason is extremely simple: the application *CANNOT* necessarily
known that it gets stuck behind a higher-priority thread (though it
could had run on another CPU if the scheduler had decided otherwise).
That's *NOT* doable to program in a deterministic fashion in such
You could use CPU binding. I'd argue that this may be nearly a
any realtime system of significant complexity on an SMP system.
*But*, notice that the program in question did not run on UP and
on SMP, rather it would not work on single processor AT ALL.
"Realtime" put into quote. I am speaking here of soft realtime, that
is an environment whose tasks scheduling follow a specific
deterministic order. I am not speaking about hard-realtime that have
additional timing constraints. Following that definition, we can say
that Linux offers (soft) "Realtime".
> The driver really needs to sleep, use a mutex, use a lower priority,
or something in order for it to work.
NO! It is not the responsability of the application to fix that
behavior! We can in our case because 'we know', but some applications
That's a bit hand-wavy ;) but I don't dismiss it out of hand because as
I'm not so familiar with this area. I would be interested in an example
application where this matters, and which absolutely can't use any
The mistake done here is interesting. When you have a pool of servers,
you can proceed in two ways to serve the clients:
(1) make a FIFO queue for each server. When a client arrives, it
chooses the queue that is the shortest.
(2) make an unique FIFO queue for all servers. All clients are
queued, and when a server is done it takes the first client
waiting on that big queue.
Queuing theory proves that (2) is better. Exactly due to the reason we
have here. With (1), the guys in the queue might get stuck if the
corresponding server is blocked by a client. With (2), when a server
is blocked by a client, it doesn't prevent the other clients to be
served by other servers.
But that model is flawed for SMP scheduling. If it were that easy, we
might have a
single queue for _all_ tasks.
The main problem is the cost of synchronisation and cacheline sharing. A
problem is that of CPU affinities - moving a task to another CPU nearly
some non zero cost in terms of cache (and in case of NUMA, memory)
Our global queue scheduler was basically crap for more than 4 CPUs. We
RT tasks a global queue with little impact to non-RT workloads (in fact,
early iterations of the 2.6 scheduler trialed this)... but let's not
RT apps that do the right thing (and need scalablility).
Another problem is that scheduling may not be O(1) anymore, if you have
bindings in place.
To summaries, I believe that if per-CPU RT queues is allowed within
POSIX, then we
want to go with the sanest possible implementation, and force any broken
fix themselves.... let's not cave in now :)
An historical note. USA had implemented (2) in offices, supermarkets
and such long before Europa. Because in Europe, customers were
convinced that model (2) took more time, because the queue was longer.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/