* Nick Piggin <nickpiggin@xxxxxxxxxxxx> wrote:
Ingo Molnar wrote:
curious, do you have any (relatively-) simple to run testcase that clearly shows the "scalability issues" you mention above, when going
from rwlocks to spinlocks? I'd like to give it a try on an 8-way box.
Arjan van de Ven wrote:
I'm curious what scalability advantage you see for rw spinlocks vs real
spinlocks ... since for any kind of moderate hold time the opposite is
expected ;)
It actually surprised me too, but Peter Chubb (who IIRC provided the motivation to merge the patch) showed some fairly significant improvement at 12-way.
https://www.gelato.unsw.edu.au/archives/scalability/2005-March/000069.html
i think that workload wasnt analyzed well enough (by us, not by Peter, who sent a reasonable analysis and suggested a reasonable change), and we went with whatever magic change appeared to make a difference, without fully understanding the underlying reasons. Quote:
"I'm not sure what's happening in the 4-processor case."
Now history appears to be repeating itself, just in the other direction ;) And we didnt get one inch closer to understanding the situation for real. I'd vote for putting a change-moratorium on tree-lock and only allow a patch that tweaks it that fully analyzes the workload :-)
one thing off the top of my mind: doesnt lockstat introduce significant overhead? Is this reproducable with lockstat turned off too? Is the same scalability problem visible if all read_lock()s are changed to write_lock()? [like i did in my patch] I.e. can other explanations (like unlucky alignment of certain rwlock data structures / functions) be excluded.
another thing: average hold times in the spinlock case on that workload are below 1 microsecond - probably on the range of cachemiss bounce costs on such a system.
I.e. it's the worst possible case for a spinlock->rwlock conversion! The only reason i can believe this to make a difference are cycle level races and small random micro-differences that cause heavier bouncing in the spinlock workload but happen to avoid it in the read-lock case. Not due to any fundamental advantage of rwlocks.