* Waiman Long<waiman.long@xxxxxx> wrote:
I had run some performance tests using the fserver and new_fserverSo it's not just herding that is a problem.
benchmarks (on ext4 filesystems) of the AIM7 test suite on a 80-core
DL980 with HT on. The following kernels were used:
1. Modified 3.10.1 kernel with mb_cache_spinlock in fs/mbcache.c
replaced by a rwlock
2. Modified 3.10.1 kernel + modified __read_lock_failed code as suggested
by Ingo
3. Modified 3.10.1 kernel + queue read/write lock
4. Modified 3.10.1 kernel + queue read/write lock in classic read/write
lock behavior
The last one is with the read lock stealing flag set in the qrwlock
structure to give priority to readers and behave more like the classic
read/write lock with less fairness.
The following table shows the averaged results in the 200-1000
user ranges:
+-----------------+--------+--------+--------+--------+
| Kernel | 1 | 2 | 3 | 4 |
+-----------------+--------+--------+--------+--------+
| fserver JPM | 245598 | 274457 | 403348 | 411941 |
| % change from 1 | 0% | +11.8% | +64.2% | +67.7% |
+-----------------+--------+--------+--------+--------+
| new-fserver JPM | 231549 | 269807 | 399093 | 399418 |
| % change from 1 | 0% | +16.5% | +72.4% | +72.5% |
+-----------------+--------+--------+--------+--------+
I'm wondering, how sensitive is this particular benchmark to fairness?
I.e. do the 200-1000 simulated users each perform the same number of ops,
so that any smearing of execution time via unfairness gets amplified?
I.e. does steady-state throughput go up by 60%+ too with your changes?