One major issue with a combined count/owner is that we may have to use
cmpxchg for reader lock which will certainly impact reader-heavy
workloads. I have also thought about ways to compress the task pointer
address so that it can use fewer bits and leave the rests for reader
count. It is probably doable on 64-bit systems, but likely not on 32-bit
system given that there are less bits to play around.