Re: [PATCH] x86_64 : support atomic ops with 64 bits integervalues

From: Linus Torvalds
Date: Sat Aug 16 2008 - 13:32:03 EST




On Sat, 16 Aug 2008, Mathieu Desnoyers wrote:
>
> I have hit this problem when tying to implement a better rwlock design
> than is currently in the mainline kernel (I know the RT kernel has a
> hard time with rwlocks)

Have you looked at my sleping rwlock trial thing?

It's very different from a spinning one, but I think the fast path should
be identical, and that's the one I tried to make fairly optimal.

See

http://git.kernel.org/?p=linux/kernel/git/torvalds/rwlock.git;a=summary

for a git tree. The sleeping version has two extra words for the sleep
events, but those would be irrelevant for the spinning version.

The fastpath is

movl $4,%eax
lock ; xaddl %eax,(%rdi)
testl $3,%eax
jne __my_rwlock_rdlock

for the read-lock (the two low bits are contention bits, so you can make
contention have any behaviour you want - including fairish, prefer-reads,
or prefer-writes).

The write fastpath is

xorl %eax,%eax
movl $1,%edx
lock ; cmpxchgl %edx,(%rdi)
jne __my_rwlock_wrlock

and the "unlock" case is actually unnecessarily complex in my
implementation, because it needs to

- wake things up in case of a conflict (not true of a spinning version,
of course)
- it's pthreads-compatible, so the same function needs to handle both a
read-unlock and a write-unlock.

but a spinning version should be much simpler.

Anyway, I haven't tried turning it into a spinning version, but it was
very much designed to

- work with both 32-bit and 64-bit x86 by making the fastpath only do
32-bit locked accesses
- have any number of pending readers/writers (which is not a big deal for
a spinning one, but at least there are no CPU count overflows).
- and because it is designed for sleeping, I'm pretty sure that you can
easily drop interrupts in the contention path, to make
write_lock_irq[save]() be reasonable.

In particular, the third bullet is the important one: because it's
designed to have a "contention" path that has _extra_ information for the
contended case, you could literally make the extra information have things
like a list of pending writers, so that you can drop interrupts on one
CPU, while you adding information to let the reader side know that if the
read-lock happens on that CPU, it needs to be able to continue in order to
not deadlock.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/