The #1 patch is nice by itself - as it lays out the foundation of the
MCS-similar code - and if Ingo decides he does not want this pending
byte-lock bit business - it can be easily reverted or dropped.
The pending bit code is needed for performance parity with ticket
spinlock for light load. My own measurement indicates that the queuing
overhead will cause the queue spinlock to be slower than ticket spinlock
with 2-4 contending tasks. The pending bit solves the performance
problem with 2 contending tasks, leave only the 3-4 tasks cases being a
bit slower than the ticket spinlock which should be more than
compensated by its superior performance with heavy contention and
slightly better performance with no contention.