I got massive performance improvements from changing a driver
we have to use mutex instead of the old semaphores (the driver
was written a long time ago).
While these weren't 'rw' the same issue will apply.
The problem was that the semaphore/mutex was typically only held over
a few instructions (eg to add an item to a list).
But with semaphore if you got contention the process always slept.
OTOH mutex spin 'for a while' before sleeping so the code rarely slept.