On a 2-socket 36-core 72-thread x86-64 E5-2699 v3 system, a rwsem
microbenchmark was run with 36 locking threads (one/core) doing 100k
reader and writer lock/unlock operations each, the resulting locking
rates (avg of 3 runs) on a 4.10 kernel were 561.4 Mop/s and 588.8
Mop/s without and with the patch respectively. That was an increase
of about 5%.