[ Adding Waiman Long to the participants, because this seems to be a
very specific cacheline alignment behavior of rwsems, maybe Waiman has
some comments ]
On Sun, Jun 6, 2021 at 3:16 AM Feng Tang <feng.tang@xxxxxxxxx> wrote:
* perf-c2c: The hotspots(HITM) for 2 kernels are different due to theOooh.
data structure change
- old kernel
- first cacheline
mmap_lock->count (75%)
mm->mapcount (14%)
- second cacheline
mmap_lock->owner (97%)
- new kernel
mainly in the cacheline of 'mmap_lock'
mmap_lock->count (~2%)
mmap_lock->owner (95%)
It looks like pretty much all the contention is on mmap_lock, and the
difference is that the old kernel just _happened_ to split the
mmap_lock rwsem at *exactly* the right place.
The rw_semaphore structure looks like this:
struct rw_semaphore {
atomic_long_t count;
atomic_long_t owner;
struct optimistic_spin_queue osq; /* spinner MCS lock */
...
and before the addition of the 'write_protect_seq' field, the mmap_sem
was at offset 120 in 'struct mm_struct'.
Which meant that count and owner were in two different cachelines, and
then when you have contention and spend time in
rwsem_down_write_slowpath(), this is probably *exactly* the kind of
layout you want.
Because first the rwsem_write_trylock() will do a cmpxchg on the first
cacheline (for the optimistic fast-path), and then in the case of
contention, rwsem_down_write_slowpath() will just access the second
cacheline.
Which is probably just optimal for a load that spends a lot of time
contended - new waiters touch that first cacheline, and then they
queue themselves up on the second cacheline. Waiman, does that sound
believable?