On Tue, 7 Apr 2009, Pekka Enberg wrote:
Hmm, partial lists are per-node, so wouldn't it be better to do theBtw, that requires moving ->min_partial to struct kmem_cache_node from
adjustment for every struct kmem_cache_node separately? The
'min_partial_per_node' global seems just too ugly and confusing to live
with.
struct kmem_cache. But I think that makes a whole lot of sense if
some nodes may have more CPUs than others.
And while the improvement is kinda obvious, I would be interested to
know what kind of workload benefits from this patch (and see numbers
if there are any).
It doesn't really depend on the workload, it depends on the type of NUMA machine it's running on (and whether that NUMA is asymmetric amongst cpus).
Since min_partial_per_node is capped at MAX_PARTIAL, this is only really relevant for remote node defragmentation if it's allowed (and not just 2% of the time like the default). We want to avoid stealing partial slabs from remote nodes if there are fewer than the number of cpus on that node.
Otherwise, it's possible for each cpu on the victim node to try to allocate a single object and require nr_cpus_node(node) new slab allocations. In this case it's entirely possible for the majority of cpus to have cpu slabs from remote nodes. This change reduces the liklihood of that happening because we'll always have cpu slab replacements on our local partial list before allowing remote defragmentation.
I'd be just as happy with the following, although it would require changing MIN_PARTIAL to be greater than its default of 5 if a node supports more cpus for optimal performance (the old patch did that automatically up to MAX_PARTIAL).
diff --git a/mm/slub.c b/mm/slub.c--
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1326,11 +1326,13 @@ static struct page *get_any_partial(struct kmem_cache *s, gfp_t flags)
zonelist = node_zonelist(slab_node(current->mempolicy), flags);
for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
struct kmem_cache_node *n;
+ int node;
- n = get_node(s, zone_to_nid(zone));
+ node = zone_to_nid(zone);
+ n = get_node(s, node);
if (n && cpuset_zone_allowed_hardwall(zone, flags) &&
- n->nr_partial > s->min_partial) {
+ n->nr_partial > nr_cpus_node(node)) {
page = get_partial_node(n);
if (page)
return page;