Re: Priority Inheritance Test (Real-Time Preemption)

From: Esben Nielsen
Date: Mon Nov 29 2004 - 10:08:47 EST


On Mon, 29 Nov 2004, Ingo Molnar wrote:

>
> * Esben Nielsen <simlo@xxxxxxxxxx> wrote:
>
> [...]
> To give you an extreme example: you cannot run OpenOffice.org with
> SCHED_FIFO prio 99 and expect it to have any sane deterministic latency
> bounds. The simpler the app, the easier it is to control latencies.
>

No, but I want to have my RT-subsystem to be not affected even if the
users choose to run 1000 soffice.bin at SCHED_NORMAL!! Not that I suspect
a cracker would be able to lock on to my RT-system and do it on perpose,
but I know developers make errors. Very often I have seen servers
becomming over-burdened because some application keeps spawning new
threads/processors - often as a solution to being overloaded!!!

That would indeed demand that the kernel would be O(1) in a lot of places
where it isn't right now. 1) The first job is to identify those places, 2)
make the core system be pure O(1) such that RT applications are posible at
all, 3) stop using spinlocks for all none-O(1) places such that RT
applications are not disturbed by none-O(1) behaviour even though
none-real-time threads hits them all the time and 4) chance the most
usefull, but none-critical places to be O(1), too, such that these
subsystems can be used in real-time applications as well.

I think you have come very far with 1)-3). There is a lot of work in 4)
but I think that can be left to those people who actually need to use
those subsystems in a real-time application:

The filesystems will most likely never be O(1) but simple device drivers
can easily be made O(1). I.e. you can make real-time applications using a
character device once you have opened the device but not the file-system
in general. The TCP/IP stack probably wont be O(1) for a long time, i.e.
no real-time application can use TCP/IP directly for a long time :-(

Now I started to look at priority inheritance because I saw that as the
way to fullfill 3) with the twist that real-time applications can share
resources with non-real-time applications. I think the current
implementation is good, if the current mechanism is keeped inside the
kernel. The problem is that a raw spinlock is used to lock for a iteration
over a list which can be O(number of waiters * locking depth) long. As
long as we are in the kernel both is "controlled", i.e. one can see the
worst-case number in stress test and know it can't get worse. *

However, if the same mechanism is used for a userspace mutex there will be
a problem since very long list of waiters can be build up on a mutex by a
faulty application. (That could be one where starvation is solved by
spawning new the threads to handle the work, which then will block on the
critical region, such the lists gets longer and longer...) The solution I
see, is to uplift the level of abstraction: Have the same mechanism but
the lock around the wait queues is this time around the kernel mutex. I.e.
the kernel mutex is implemented with a raw spinlock, the user space mutex
is implemented by a kernel mutex! That will be more expensive, ofcourse,
but avoid that non-real-time userspace threads having degrading the
system latency by bad usage of muteces.

But there are probably still a lot of other places where non-real-time
userspace threads can increase the overall system latency in a
unpredictable way, but I think you got a good grab on those now :-)


> [...]
> Ingo
>

Esben

*If somebody is afraid that the number of waiters on a mutex can be made
unlimited long by forinstance having a lot of non-real-time processes
calling into the same kernel region it can be fixed by moving and
keeping processes at priority 100 and SCHED_FIFO when they own a mutex
such they will run to completion before other non-real-time task can be
scheduled on the same CPU. In fact that will make the system behave as
the old non-preemptive kernel when there is only non-real-time threads! --
which also saves the system from rescheduling and thus make it have a
better through-put performance.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/