On Wed, Mar 21, 2018 at 01:44:25PM -0400, Daniel Jordan wrote:
On 03/20/2018 04:54 AM, Aaron Lu wrote:
...snip...
reduced zone->lock contention on free path from 35% to 1.1%. Also, it
shows good result on parallel free(*) workload by reducing zone->lock
contention from 90% to almost zero(lru lock increased from almost 0 to
90% though).
Hi Aaron, I'm looking through your series now. Just wanted to mention that I'm seeing the same interaction between zone->lock and lru_lock in my own testing. IOW, it's not enough to fix just one or the other: both need attention to get good performance on a big system, at least in this microbenchmark we've both been using.
Agree.
There's anti-scaling at high core counts where overall system page faults per second actually decrease with more CPUs added to the test. This happens when either zone->lock or lru_lock contention are completely removed, but the anti-scaling goes away when both locks are fixed.
Anyway, I'll post some actual data on this stuff soon.
Looking forward to that, thanks.
In the meantime, I'll also try your lru_lock optimization work on top of
this patchset to see if the lock contention shifts back to zone->lock.