On Fri, Apr 09, 2021 at 07:18:40PM -0400, Waiman Long wrote:
Before the new slab memory controller with per object byte charging,The idea makes total sense to me and also gives a hope to remove
charging and vmstat data update happen only when new slab pages are
allocated or freed. Now they are done with every kmem_cache_alloc()
and kmem_cache_free(). This causes additional overhead for workloads
that generate a lot of alloc and free calls.
The memcg_stock_pcp is used to cache byte charge for a specific
obj_cgroup to reduce that overhead. To further reducing it, this patch
makes the vmstat data cached in the memcg_stock_pcp structure as well
until it accumulates a page size worth of update or when other cached
data change.
byte-sized vmstats in the long-term.
On a 2-socket Cascade Lake server with instrumentation enabled and thisBecause vmstat_idx can realistically take only 3 values (slab_reclaimable,
patch applied, it was found that about 17% (946796 out of 5515184) of the
time when __mod_obj_stock_state() is called leads to an actual call to
mod_objcg_state() after initial boot. When doing parallel kernel build,
the figure was about 16% (21894614 out of 139780628). So caching the
vmstat data reduces the number of calls to mod_objcg_state() by more
than 80%.
Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
---
mm/memcontrol.c | 78 +++++++++++++++++++++++++++++++++++++++++++------
mm/slab.h | 26 +++++++----------
2 files changed, 79 insertions(+), 25 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b19100c68aa0..539c3b632e47 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2220,7 +2220,10 @@ struct memcg_stock_pcp {
#ifdef CONFIG_MEMCG_KMEM
struct obj_cgroup *cached_objcg;
+ struct pglist_data *cached_pgdat;
unsigned int nr_bytes;
+ int vmstat_idx;
+ int vmstat_bytes;
#endif
slab_unreclaimable and percpu), I wonder if it's better to have
vmstat_bytes[3] and save a bit more on the reduced number of flushes?
It must be an often case when a complex (reclaimable) kernel object has
non-reclaimable parts (e.g. kmallocs) or percpu counters.
If the difference will be too small, maybe the current form is better.