On 07/18/2013 07:49 PM, Waiman Long wrote:On 07/18/2013 06:22 AM, Thomas Gleixner wrote:[...]Waiman,
On Mon, 15 Jul 2013, Waiman Long wrote:On 07/15/2013 06:31 PM, Thomas Gleixner wrote:On Fri, 12 Jul 2013, Waiman Long wrote:
And please provide a proper argument why we can't use the sameI will try to collect more data to justify the usefulness of qrwlock.+ * an increase in lock size is not an issue.So is it faster in the general case or only for the high contention or
single thread operation cases?
And you still miss to explain WHY it is faster. Can you please explain
proper WHY it is faster and WHY we can't apply that technique you
implemented for qrwlocks to writer only locks (aka spinlocks) with a
smaller lock size?
technique for spinlocks.
Of course, we can use the same technique for spinlock. Since we only
need 1 bit for lock, we could combine the lock bit with the queue
address with a little bit more overhead in term of coding and speed.
That will make the new lock 4 bytes in size for 32-bit code & 8 bytes
for 64-bit code. That could solve a lot of performance problem that we
have with spinlock. However, I am aware that increasing the size of
spinlock (for 64-bit systems) may break a lot of inherent alignment in
many of the data structures. That is why I am not proposing such a
change right now. But if there is enough interest, we could certainly go
ahead and see how things go.
keeping apart the lock size part, for spinlocks, is it that
fastpath overhead is less significant in low contention scenarios for
qlocks?
Also let me know if you have POC implementation for the spinlocks that
you can share. I am happy to test that.
sorry. different context:For the AIM7 test suite, the fserver & new_fserver with ext4 are the best ones for exercising the qrwlock series, but you do need to have a lot of cores to see the effect. I haven't try to find other suitable benchmark tests yet.
apart from AIM7 fserver, is there any other benchmark to exercise this
qrwlock series? (to help in the testing).