Re: [linus:master] [mm] c1753fd02a: stress-ng.madvise.ops_per_sec -6.5% regression

From: Mathieu Desnoyers
Date: Mon Sep 04 2023 - 06:03:24 EST

Next message: Ryan Roberts: "Re: [PATCH v5 3/5] mm: LARGE_ANON_FOLIO for improved performance"
Previous message: Hardik Gajjar: "Re: [PATCH] usb: hcd: xhci: Add set command timer delay API"
In reply to: Yin Fengwei: "Re: [linus:master] [mm] c1753fd02a: stress-ng.madvise.ops_per_sec -6.5% regression"
Next in thread: Yin Fengwei: "Re: [linus:master] [mm] c1753fd02a: stress-ng.madvise.ops_per_sec -6.5% regression"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 9/4/23 01:32, Yin Fengwei wrote:

On 7/19/23 14:34, kernel test robot wrote:

hi, Mathieu Desnoyers,

we noticed that this commit addressed issue:
"[linus:master] [sched] af7f588d8f: will-it-scale.per_thread_ops -13.9% regression"
we reported before on:
https://lore.kernel.org/oe-lkp/202305151017.27581d75-yujie.liu@xxxxxxxxx/

we really saw a will-it-scale.per_thread_ops 92.2% improvement by this commit
(details are as below).
however, we also noticed a stress-ng regression.

below detail report FYI.

Hello,

kernel test robot noticed a -6.5% regression of stress-ng.madvise.ops_per_sec on:

commit: c1753fd02a0058ea43cbb31ab26d25be2f6cfe08 ("mm: move mm_count into its own cache line")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

I noticed that the struct mm_struct has following layout change after this patch.
Without the patch:
spinlock_t page_table_lock; /* 124 4 */
/* --- cacheline 2 boundary (128 bytes) --- */
struct rw_semaphore mmap_lock; /* 128 40 */ ----> in one cache line
struct list_head mmlist; /* 168 16 */
int mm_lock_seq; /* 184 4 */

With the patch:
spinlock_t page_table_lock; /* 180 4 */
struct rw_semaphore mmap_lock; /* 184 40 */ ----> cross to two cache lines
/* --- cacheline 3 boundary (192 bytes) was 32 bytes ago --- */
struct list_head mmlist; /* 224 16 */

If your intent is just to make sure that mmap_lock is entirely contained
within a cache line by forcing it to begin on a cache line boundary, you
can do:

struct mm_struct {
[...]
struct rw_semaphore mmap_lock ____cacheline_aligned_in_smp;
struct list_head mmlist;
[...]
};

The code above keeps mmlist on the same cache line as mmap_lock if
there happens to be enough room in the cache line after mmap_lock.

Otherwise, if your intent is to also eliminate false sharing by making
sure that mmap_lock sits alone in its cache line, you can do the following:

struct mm_struct {
[...]
struct {
struct rw_semaphore mmap_lock;
} ____cacheline_aligned_in_smp;
struct list_head mmlist;
[...]
};

The code above keeps mmlist in a separate cache line from mmap_lock;

Depending on the usage, one or the other may be better. Comparative
benchmarks of both approaches would help choosing the best way forward
here.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

Next message: Ryan Roberts: "Re: [PATCH v5 3/5] mm: LARGE_ANON_FOLIO for improved performance"
Previous message: Hardik Gajjar: "Re: [PATCH] usb: hcd: xhci: Add set command timer delay API"
In reply to: Yin Fengwei: "Re: [linus:master] [mm] c1753fd02a: stress-ng.madvise.ops_per_sec -6.5% regression"
Next in thread: Yin Fengwei: "Re: [linus:master] [mm] c1753fd02a: stress-ng.madvise.ops_per_sec -6.5% regression"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]