Re: [PATCH] futex: Support smaller futexes of one byte or two byte size.

From: Malte Skarupke
Date: Wed Dec 11 2019 - 14:51:18 EST

Am Mi, 11. Dez 2019, um 08:44, schrieb Peter Zijlstra:
> See, I disagree. The WAIT/WAKE pair is an implementation of the MONITOR
> pattern. Similar to x86's MONITOR/MWAIT or ARMv8's LDXR/WFE.

That is a good way of putting it. It's not the mental model I had of futexes,
but I can see how under that mental model a WAKE that doesn't wake an
overlapping futex is inconsistent. (my mental model was a memory comparison
followed by a hashtable store. The WAKE has to pass the same hash key to
find anything)

The question of whether to pass a size to WAKE or not is not something I care
about (as I said, my first version did pass a size to WAKE) so I will change
my code to require a size in WAKE. I have my opinions about which one I prefer
but if I can't convince you then that shouldn't hold back the change. But
a sized WAKE does open up a few questions:

1. It would be easy for me to verify that sizes match when the address that's
passed in is the same address. Meaning a WAIT with 8 bits but a WAKE with
16 bits on the same address would return -EINVAL. But it's harder to
validate a similar mismatch on overlapping futexes. Meaning a 32 bit WAIT
on @ptr and a 8 bit WAKE on @ptr+1 would just return 0.

This might theoretically lead to problems if we ever want to have the
semantics of MONITOR/MWAIT in the future, because I'm saying that that
last mismatch does not wake anything and is not an error. This will make
it hard to change the semantics in the future. Are we OK with that?

2. If we are OK with that, there are questions about what to do about weird
uses like a 8 bit WAIT and a 32 bit WAIT on the same address. An 8 bit
WAKE on the same address should now probably just wake the 8 bit WAIT, but
a second 8 bit WAKE would then return an error because there is a
mismatching size. This is confusing so my first thought is to just disallow
all mixing of sizes, even among waiters. It seems unlikely that this will
ever be needed.

3. Implementing the full semantics of MONITOR/MWAIT would give obvious answers
to both of these questions, and I think I could even do it without slowing
down futexes too much. (just use the address/4 as the hash key so that all
potentially overlapping futexes hash to the same bucket) But it would also
open up new questions. Like what happens if you call a 32 bit WAKE with
val=1, meaning wake up one sleeper, but it overlaps with two 16 bit futexes?
Does it wake one sleeper in both of them? That would be consistent with
MONITOR/MWAIT behavior, but it would contradict the current documentation.
(which might be OK because it would only do so for new behavior)

My current thinking for these is as follows, in reverse order:

3. I would not implement the full MONITOR/MWAIT behavior. It's not required
for my use case, it would definitely add a small amount of overhead, and it
bothers me that adjacent 8 bit futexes would hash to the same bucket. (at
least they would if I can't come up with another way of implementing it) All
I want is smaller mutexes and the behavior required by the C++ standard, so I
don't need new semantics for that.

2. I will disallow all operations that pass a different size for the same
address. That includes disallowing an 8 bit WAIT followed by a 32 bit WAIT on
the same address.

1. I will only verify that the size matches if the same address was passed.
Overlapping futexes are not detected and will not raise an error.

This will lead to an inconsistent implementation in the MONITOR/MWAIT mental
model, but it leads to a consistent implementation in the comparison+hashtable
mental model. It will have the same behavior as my initial patch except that
all operations will error out if a size mismatch is detected.

Please let me know if this is OK with you. I don't think that this gains us
anything over having an unsized WAKE, but if this is what others want, I can
write the code. If we instead want the full MONITOR/MWAIT semantics, I can
probably also make that happen with a very small slowdown compared to the
current code. I just need to know if that's what you want.



Malte Skarupke