This patch looks good to me.
Larry, does Hugh's patch survive your testing?
Like I said earlier, no. However, I finally set up a reproducer that only takes a few seconds
on a large system and this totally fixes the problem:
-------------------------------------------------------------------------------------------------------------------------
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c36febb..cc023b8 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2151,7 +2151,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
goto nomem;
/* If the pagetables are shared don't copy or take references */
- if (dst_pte == src_pte)
+ if (*(unsigned long *)dst_pte == *(unsigned long *)src_pte)
continue;
spin_lock(&dst->page_table_lock);
---------------------------------------------------------------------------------------------------------------------------
When we compare what the src_pte & dst_pte point to instead of their addresses everything works,
I suspect there is a missing memory barrier somewhere ???
Larry