Re: [PATCH 1/2] lockref: speculatively spin waiting for the lock to be released

From: Linus Torvalds
Date: Wed Jun 12 2024 - 21:23:52 EST


On Wed, 12 Jun 2024 at 17:12, Mateusz Guzik <mjguzik@xxxxxxxxx> wrote:
>
> While I did not try to figure out who transiently took the lock (it was
> something outside of the benchmark), I devised a trivial reproducer
> which triggers the problem almost every time: merely issue "ls" of the
> directory containing the tested file (in this case: "ls /tmp").

So I have no problem with your patch 2/2 - moving the lockref data
structure away from everything else that can be shared read-only makes
a ton of sense independently of anything else.

Except you also randomly increased a retry count in there, which makes no sense.

But this patch 1/2 makes me go "Eww, hacky hacky".

We already *have* the retry loop, it's just that currently it only
covers the cmpxchg failures.

The natural thing to do is to just make the "wait for unlocked" be
part of the same loop.

In fact, I have this memory of trying this originally, and it not
mattering and just making the code uglier, but that may be me
confusing myself. It's a *loong* time ago.

With the attached patch, lockref_get() (to pick one random case) ends
up looking like this:

mov (%rdi),%rax
mov $0x64,%ecx
loop:
test %eax,%eax
jne locked
mov %rax,%rdx
sar $0x20,%rdx
add $0x1,%edx
shl $0x20,%rdx
lock cmpxchg %rdx,(%rdi)
jne fail
// SUCCESS
ret
locked:
pause
mov (%rdi),%rax
fail:
sub $0x1,%ecx
jne loop

(with the rest being the "take lock and go slow" case).

It seems much better to me to have *one* retry loop that handles both
the causes of failures.

Entirely untested, I only looked at the generated code and it looked
reasonable. The patch may be entirely broken for some random reason I
didn't think of.

And in case you wonder, that 'lockref_locked()' macro I introduce is
purely to make the code more readable. Without it, that one
conditional line ends up being insanely long, the macro is there just
to break things up into slightly more manageable chunks.

Mind testing this approach instead?

Linus
lib/lockref.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/lib/lockref.c b/lib/lockref.c
index 2afe4c5d8919..70f38621901b 100644
--- a/lib/lockref.c
+++ b/lib/lockref.c
@@ -4,6 +4,9 @@

#if USE_CMPXCHG_LOCKREF

+#define lockref_locked(l) \
+ unlikely(!arch_spin_value_unlocked((l).lock.rlock.raw_lock))
+
/*
* Note that the "cmpxchg()" reloads the "old" value for the
* failure case.
@@ -13,7 +16,12 @@
struct lockref old; \
BUILD_BUG_ON(sizeof(old) != 8); \
old.lock_count = READ_ONCE(lockref->lock_count); \
- while (likely(arch_spin_value_unlocked(old.lock.rlock.raw_lock))) { \
+ do { \
+ if (lockref_locked(old)) { \
+ cpu_relax(); \
+ old.lock_count = READ_ONCE(lockref->lock_count); \
+ continue; \
+ } \
struct lockref new = old; \
CODE \
if (likely(try_cmpxchg64_relaxed(&lockref->lock_count, \
@@ -21,9 +29,7 @@
new.lock_count))) { \
SUCCESS; \
} \
- if (!--retry) \
- break; \
- } \
+ } while (--retry); \
} while (0)

#else