Re: [PATCH 3/3] 64-bit futexes: x86 support

From: Ulrich Drepper
Date: Sat May 31 2008 - 00:25:45 EST


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Linus Torvalds wrote:
> I'm suggesting that there are better approaches than forcing yet another
> new interface onto this thing. As mentioned, the kernel has had 32-bit
> rwsemaphores for a long time. Yes, there is "extra information" outside
> the 32-bit field, but I suspect you simply haven't looked hard enough at
> just using the existing futex code.

I guess have to expect to expect to be insulted by you. What else is
new? I think it's safe to say that nobody has looked more deeply at the
implementation possibilities with futexes. I don't know how many
different primitives I've written over the years. If you think you can
do it better, well, that's then a challenge to you. Come up with
something better.

The rwlocks cannot be handled as efficiently with 32-bit futexes.

I stand by that. Proof me wrong. You don't have to write code, just
describe it. But don't publish it if

- - the uncontended reader *unconditionally* uses more than one memory
access (atomic of course)

- - the contended readers uses the one access above plus one or two
compare-and-exchange depending on the rwlock preferring readers or
writers. The last compare-and-exchange has to be repeated for
spurious wakeups which are highly unlikely since we only wake what
is ready to use.

- - the uncontended writer needs more than one single compare-and-exchange
plus one memory read plus one write (both not atomic)

- - the contended writer needs more than the above plus one additional
compare-and-exchange for every time we delay in futex_wait (again,
highly unlikely to have spurious wakeups)

- - the unlock operation needs more than one read plus either one
(repeated) compare-and-exchange or another atomic operation plus one
memory write for writer locks

- - you can handle timedlock variants which don't manage to get the lock
efficiently without waking up anyone else

That's it.

And just to point out some more fun: in a highly contended case this is
a performance killer (MESI protocol states):

core #1 core #2

instr cache state instr cache state

load -> S

store -> M

-> S load -> S

store -> M


I.e., of the cache line with the rwlock is in one core's cache and you
first load a value from that cache line just to modify it right after
that, you're paying a terrible price.


Have fun coming up with something which is as good or better. But shall
we say there is a deadline? Otherwise we'd wait too long.

- --
â Ulrich Drepper â Red Hat, Inc. â 444 Castro St â Mountain View, CA â
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkhA0zUACgkQ2ijCOnn/RHRI2QCbBGCcjES6qMJYay1WRVLdSAAk
lHEAnA754SGDsjuzvCYg2ZBrEUIktxYV
=NYYc
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/