Real-time rw-locks (Re: [patch] Real-Time Preemption,-RT-2.6.10-rc2-mm3-V0.7.32-15)

From: Esben Nielsen
Date: Mon Dec 27 2004 - 09:40:35 EST


I haven't seen much traffic on real-time preemption lately. Is it due
to Christmas or lost interest?

I noticed that you changed rw-locks to behave quite diferently under
real-time preemption: They basicly works like normal locks now. I.e. there
can only be one reader task within each region. This can can however lock
the region recursively. I wanted to start looking at fixing that because
it ought to hurt scalability quite a bit - and even on UP create a few
unneeded task-switchs. However, the more I think about it the bigger the
problem:

First, let me describe how I see a read-write lock. It has 3 states:
a) unlocked
b) locked by n readers
c) locked by 1 writer
There can either be 1 writer within the protected region or n>=0
readers within the region. When a writer wants to take the lock,
calling down_write(), it has to wait until the read count is 0. When a
reader wants to take the lock, calling down_read(), he has only to wait
until the the writer is done - there is no need to wait for the other
readers.

Now in a real-time system down_X() ought to have a deterministic
blocking time. It should be easy to make down_read() deterministic: If
there is a writer let it inherit the calling readers priority.
However, down_write() is hard to make deterministic. Even if we assume
that the lock not only keeps track of the number of readers but keeps a
list of all the reader threads within the region it can traverse the list
and boost the priority of all those threads. If there is n readers when
down_write() is called the blocking time would be
O(ceil(n/#cpus))
time - which is unbounded as n is not known!

Having a rw-lock with deterministic down_read() but non-deterministic
down_write() would be very usefull in a lot of cases. The characteritic is
that the data structure being protected is relative static, is going
to be used by a lot of RT readers and the updates doesn't have to be done
with any real-time requirements.
However, there is no way to know in general which locks in the kernel can
be allowed to work like that and which can't. A good compromise would be
limit the number of readers in a lock by the number of cpu's on the
system. That would make the system scale over several CPUs without hitting
unneeded congestions on read-locks and still have a determnistic
down_write().

down_write() shall then do the following: Boost the priority of all the
active readers to the priority of the caller. This will in turn distribute
the readers over the cpu's of the system assuming no higher priority RT
tasks are running. All the reader tasks will then run to up_read() in
time O(1) as they can all run in parellel - assuming there is no ugly
nested locking ofcourse!
down_read() should first check if there is a writer. If there is
boost it and wait. If there isn't but there isn't room for another reader
boost one of the readers such it will run to up_read().

An extra bonus of not having the number of readers bounded: The various
structures needed for making the list of readers can be allocated once.
There is no need to call kmalloc() from within down_read() to get a list
element for the lock's list of readers.

I don't know wether I have time for coding this soon. Under all
circumstances I do not have a SMP system so I can't really test it if I
get time to code it :-(

Esben



On Tue, 14 Dec 2004, Ingo Molnar wrote:

>
> * Rui Nuno Capela <rncbc@xxxxxxxxx> wrote:
>
> > Isn't this tightly related to mkinitrd sometimes hanging while on
> > mount -o loop, that I've been reporting a couple of times before? It
> > used to hang on any other time I do a new kernel install, but latetly
> > it seems to be OK (RT-V0.9.32-19 and -20).
>
> yeah, i've added Thomas Gleixner's earlier semaphore->completion
> conversion to the loop device, to -19 or -18.
>
> Ingo
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/