Re: [PATCH RFC 1/1] mm, page_alloc: fix incorrect zone_statistics data

From: hejianet
Date: Tue Dec 20 2016 - 22:01:37 EST




On 20/12/2016 5:18 PM, Michal Hocko wrote:
On Mon 12-12-16 13:59:07, Jia He wrote:
In commit b9f00e147f27 ("mm, page_alloc: reduce branches in
zone_statistics"), it reconstructed codes to reduce the branch miss rate.
Compared with the original logic, it assumed if !(flag & __GFP_OTHER_NODE)
z->node would not be equal to preferred_zone->node. That seems to be
incorrect.
I am sorry but I have hard time following the changelog. It is clear
that you are trying to fix a missed NUMA_{HIT,OTHER} accounting
but it is not really clear when such thing happens. You are adding
preferred_zone->node check. preferred_zone is the first zone in the
requested zonelist. So for the most allocations it is a node from the
local node. But if something request an explicit numa node (without
__GFP_OTHER_NODE which would be the majority I suspect) then we could
indeed end up accounting that as a NUMA_MISS, NUMA_FOREIGN so the
referenced patch indeed caused an unintended change of accounting AFAIU.

If this is correct then it should be a part of the changelog. I also
cannot say I would like the fix. First of all I am not sure
__GFP_OTHER_NODE is a good idea at all. How is an explicit usage of the
flag any different from an explicit __alloc_pages_node(non_local_nid)?
In both cases we ask for an allocation on a remote node and successful
allocation is a NUMA_HIT and NUMA_OTHER.

That being said, why cannot we simply do the following? As a bonus, we
can get rid of a barely used __GFP_OTHER_NODE. Also the number of
branches will stay same.
Yes, I agree maybe we can get rid of __GFP_OTHER_NODE if no objections
Seems currently it is only used for hugepage and statistics
---
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 429855be6ec9..f035d5c8b864 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2583,25 +2583,17 @@ int __isolate_free_page(struct page *page, unsigned int order)
* Update NUMA hit/miss statistics
*
* Must be called with interrupts disabled.
- *
- * When __GFP_OTHER_NODE is set assume the node of the preferred
- * zone is the local node. This is useful for daemons who allocate
- * memory on behalf of other processes.
*/
static inline void zone_statistics(struct zone *preferred_zone, struct zone *z,
gfp_t flags)
{
#ifdef CONFIG_NUMA
- int local_nid = numa_node_id();
- enum zone_stat_item local_stat = NUMA_LOCAL;
-
- if (unlikely(flags & __GFP_OTHER_NODE)) {
- local_stat = NUMA_OTHER;
- local_nid = preferred_zone->node;
- }
+ if (z->node == preferred_zone->node) {
+ enum zone_stat_item local_stat = NUMA_LOCAL;
- if (z->node == local_nid) {
__inc_zone_state(z, NUMA_HIT);
+ if (z->node != numa_node_id())
+ local_stat = NUMA_OTHER;
__inc_zone_state(z, local_stat);
} else {
__inc_zone_state(z, NUMA_MISS);
I thought the logic here is different
Here is the zone_statistics() before introducing __GFP_OTHER_NODE:

if (z->zone_pgdat == preferred_zone->zone_pgdat) {
__inc_zone_state(z, NUMA_HIT);
} else {
__inc_zone_state(z, NUMA_MISS);
__inc_zone_state(preferred_zone, NUMA_FOREIGN);
}
if (z->node == numa_node_id())
__inc_zone_state(z, NUMA_LOCAL);
else
__inc_zone_state(z, NUMA_OTHER);

B.R.
Jia