Re: [PATCH] futex: Support smaller futexes of one byte or two byte size.

From: Malte Skarupke
Date: Sun Dec 08 2019 - 17:20:09 EST

In my first version WAKE also took a size parameter, however I chose to not send that version because there are two problems with that. You already noticed the first problem: we would have to store the size and verify that it matches on WAKE.

The second problem is with REQUEUE and CMP_REQUEUE: The usual case there will be that you have a mutex that's 8 bits in size, but the condition variable is harder to fit into 8 bits and will be larger. So you would need to pass in two different size flags. But REQUEUE doesn't actually care what the size is at all. It just moves waiters around. CMP_REQUEUE does care, but only for one of the arguments. So it works out perfectly that there is one flag for the size.

Those two cases really helped me clarify what the size argument actually is needed for. It's only needed for the "compare" part of CMP_REQUEUE. Similarly WAIT could have also been named CMP_WAIT and if it was named that, it would also be obvious that the size argument is only needed for the "compare" part of WAIT. All the other work that WAIT does doesn't actually care about the memory behind the pointer at all. It only cares about the address.

That also illustrates what should happen if we splice a futex and call WAIT on @ptr and WAKE on @ptr+1: If I understand you correctly, (and correct me if I'm wrong here) your point is that there would be an inconsistency in the API there. You say it would be inconsistent for a size-less WAKE to not wake a futex that it's pointing in the middle of.

But with the above thoughts, we realize that since those are two different addresses, they refer to two different futexes. It doesn't matter that the WAIT was for a four byte futex. That information was only relevant for the "compare" part of WAIT. It's not relevant for anything else, and therefore it's not relevant for the identity of the futex. So the fact that splicing a futex doesn't work (and can't easily be made to work) does not point at an inconsistency in the API.

Taking all that into account, I believe there are two possible consistent implementations for sized futexes. One of them is the one you're asking for, where the size is always passed and always verified to be correct. The other one is the one I'm proposing, where the size argument only applies to the "compare" part of WAIT and CMP_REQUEUE, and all the other work of futexes is size-less and only works on the address. (and I think similar reasoning will work for the operations that are not supported yet)

I believe that between those two consistent implementations, the one with size-less WAKE and REQUEUE is preferable. REQUEUE in particular makes clear how we really don't care about the size in these operations. There is no difference in behavior when moving between futexes of different sizes or the same size. It just doesn't matter. But if REQUEUE is size-less, it would be inconsistent for WAKE to require a size since REQUEUE is just a WAKE with extra features.

The other downside of the version that checks the size is that we, well, have to check the size. That's extra work we have to do and extra data we have to store, and I can't come up with any case where a user would actually benefit from us doing that extra work.

All that being said I agree with your other comments (renaming FUTEX_NO_READ_WRITE to FUTEX_NONE, and introducing a futex_size() function to simplify some of the code). I'll change the code and send a new patch.

Meanwhile let me know what you would like to do about passing a size to WAKE or not.

Malte Skarupke

Am Fr, 6. Dez 2019, um 12:37, schrieb Peter Zijlstra:
> On Fri, Dec 06, 2019 at 04:31:29PM +0100, Peter Zijlstra wrote:
> > > + case FUTEX_WAKE:
> > > + case FUTEX_REQUEUE:
> > > + /*
> > > + * these instructions work with sized mutexes, but you
> > > + * don't need to pass the size. we could silently
> > > + * ignore the size argument, but the code won't verify
> > > + * that the correct size is used, so it's preferable
> > > + * to make that clear to the caller.
> > > + *
> > > + * for requeue the meaning would also be ambiguous: do
> > > + * both of them have to be the same size or not? they
> > > + * don't, and that's clearer when you just don't pass
> > > + * a size argument.
> > > + */
> > > + return -EINVAL;
> >
> > Took me a while to figure out this relies on FUTEX_NONE to avoid the
> > alignment tests.
> And thikning more on that, I really _realy_ hate this.
> You're basically saying WAKE is size-less, but that means we must
> consider what it means to have a u32 WAIT on @ptr, and a u8 WAKE on
> @ptr+1. If the wake really is size-less that should match.
> I'd be much happier with requiring strict sizing. Because conversely,
> what happens if you have a u32-WAIT at @ptr paired with a u8-WAKE at
> @ptr? If we demand strict size we can say that should not match. This
> does however mean we should include the size in the hash-match function.
> Your Changelog did not consider these implications at all.