Re: [Patch] shm bug introduced with pagecache in 2.3.11

Manfred Spraul (manfreds@colorfullife.com)
Sun, 21 Nov 1999 10:15:55 +0100

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Pavel Machek: "Re: [UPDATE] Linux Trace Toolkit"
Previous message: Keith Owens: "Stack trace without forcing oops (was 2.2.14pre5: kmem_grow: Called nonatomically)"

Linus Torvalds wrote:
> - if a writer is waiting for another writer (contention_ww case), it will
> have to increment the "reader" part of the semaphore value, in order to
> get the other writer to wake it up on "write_up()".
>
> - if a reader is waiting for a writer, then the reader will have
> incremented the semaphore, and the writer will know to wake it up
> becasue the semaphore value won't be zero after the "write_up()".

Only one thread must do that, otherwise you couldn't distinguish between
"multiple writers are waiting, the lock is free" and "one writer is
waiting, one reader owns the lock".

> All other races should be trivially handled by just having the spinlock,
> so the only really hard cases are the fast-path stuff where we cannot get
> the semaphore because it is too expensive.

I think there is a problem: neither "write_lock_trylock()" nor
"read_lock_trylock()" [ie the inline part of your rw-semaphore
operations] are atomic, both change a value, and if a certain flag is
set, then they undo their change.
I think there is a race with 2 CPUs [1*write,1*read] where noone owns
the semaphore, but both fail to acquire the semaphore:

CPU1: down_write() CPU2: down_read()
CPU1: CPU2:
lock bts RWL
lock inc RWL
test RWL ,0x7ff
jne contention_wr[taken]
js [taken]
<hardware interrupt>
contention_wr:
spinlock RWL_LOCK

if <we are the first sleeper>
inc RWL <executed>
if RWL == 0x8000 0001 <ie lock is free>
goto got_semaphore; <not taken, value is 0x8000 0002>
spin_unlock RWL_LOCK
schedule()

lock dec RWL;
spinlock RWL_LOCK

if <we are the first sleeper>
inc RWL <not executed>

if RWL >0 <no writer>
goto got_semaphore <not taken>

spin_unlock RWL_LOCK
schedule()

--> both threads called schedule(), and noone is going to wake them up.

A dirty hack would be that contention_wr stores the information
"highest bit set because a writer is waiting, the semaphore is not
acquired by a writer" in "struct rw_semaphore.ugly_hack", and
__down_shared_failed() always succeeds if ugly_hack is set.

But now we would starve writers.

-- Manfred

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/

Next message: Pavel Machek: "Re: [UPDATE] Linux Trace Toolkit"
Previous message: Keith Owens: "Stack trace without forcing oops (was 2.2.14pre5: kmem_grow: Called nonatomically)"