Re: [PATCH v1] mm: convert folio_estimated_sharers() to folio_likely_mapped_shared()

From: Vishal Moola
Date: Tue Feb 27 2024 - 18:46:41 EST


On Tue, Feb 27, 2024 at 12:15 PM David Hildenbrand <david@xxxxxxxxxx> wrote:
>
> Callers of folio_estimated_sharers() only care about "mapped shared vs.
> mapped exclusively", not the exact estimate of sharers. Let's consolidate
> and unify the condition users are checking. While at it clarify the
> semantics and extend the discussion on the fuzziness.
>
> Use the "likely mapped shared" terminology to better express what the
> (adjusted) function actually checks.
>
> Whether a partially-mappable folio is more likely to not be partially
> mapped than partially mapped is debatable. In the future, we might be able
> to improve our estimate for partially-mappable folios, though.
>
> Note that we will now consistently detect "mapped shared" only if the
> first subpage is actually mapped multiple times. When the first subpage
> is not mapped, we will consistently detect it as "mapped exclusively".
> This change should currently only affect the usage in
> madvise_free_pte_range() and queue_folios_pte_range() for large folios: if
> the first page was already unmapped, we would have skipped the folio.
>
> Cc: Barry Song <v-songbaohua@xxxxxxxx>
> Cc: Vishal Moola (Oracle) <vishal.moola@xxxxxxxxx>
> Cc: Ryan Roberts <ryan.roberts@xxxxxxx>
> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>

Reviewed-by: Vishal Moola (Oracle) <vishal.moola@xxxxxxxxx>

> ---
> include/linux/mm.h | 46 ++++++++++++++++++++++++++++++++++++----------
> mm/huge_memory.c | 2 +-
> mm/madvise.c | 6 +++---
> mm/memory.c | 2 +-
> mm/mempolicy.c | 14 ++++++--------
> mm/migrate.c | 8 ++++----
> 6 files changed, 51 insertions(+), 27 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 6f4825d829656..795c89632265f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2147,21 +2147,47 @@ static inline size_t folio_size(struct folio *folio)
> }
>
> /**
> - * folio_estimated_sharers - Estimate the number of sharers of a folio.
> + * folio_likely_mapped_shared - Estimate if the folio is mapped into the page
> + * tables of more than one MM
> * @folio: The folio.
> *
> - * folio_estimated_sharers() aims to serve as a function to efficiently
> - * estimate the number of processes sharing a folio. This is done by
> - * looking at the precise mapcount of the first subpage in the folio, and
> - * assuming the other subpages are the same. This may not be true for large
> - * folios. If you want exact mapcounts for exact calculations, look at
> - * page_mapcount() or folio_total_mapcount().
> + * This function checks if the folio is currently mapped into more than one
> + * MM ("mapped shared"), or if the folio is only mapped into a single MM
> + * ("mapped exclusively").
> *
> - * Return: The estimated number of processes sharing a folio.
> + * As precise information is not easily available for all folios, this function
> + * estimates the number of MMs ("sharers") that are currently mapping a folio
> + * using the number of times the first page of the folio is currently mapped
> + * into page tables.
> + *
> + * For small anonymous folios (except KSM folios) and anonymous hugetlb folios,
> + * the return value will be exactly correct, because they can only be mapped
> + * at most once into an MM, and they cannot be partially mapped.
> + *
> + * For other folios, the result can be fuzzy:
> + * (a) For partially-mappable large folios (THP), the return value can wrongly
> + * indicate "mapped exclusively" (false negative) when the folio is
> + * only partially mapped into at least one MM.
> + * (b) For pagecache folios (including hugetlb), the return value can wrongly
> + * indicate "mapped shared" (false positive) when two VMAs in the same MM
> + * cover the same file range.
> + * (c) For (small) KSM folios, the return value can wrongly indicate "mapped
> + * shared" (false negative), when the folio is mapped multiple times into
> + * the same MM.
> + *
> + * Further, this function only considers current page table mappings that
> + * are tracked using the folio mapcount(s). It does not consider:
> + * (1) If the folio might get mapped in the (near) future (e.g., swapcache,
> + * pagecache, temporary unmapping for migration).
> + * (2) If the folio is mapped differently (VM_PFNMAP).
> + * (3) If hugetlb page table sharing applies. Callers might want to check
> + * hugetlb_pmd_shared().
> + *
> + * Return: Whether the folio is estimated to be mapped into more than one MM.
> */
> -static inline int folio_estimated_sharers(struct folio *folio)
> +static inline bool folio_likely_mapped_shared(struct folio *folio)
> {
> - return page_mapcount(folio_page(folio, 0));
> + return page_mapcount(folio_page(folio, 0)) > 1;
> }
>
> #ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 50d146eb248ff..4d10904fef70c 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1829,7 +1829,7 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> * If other processes are mapping this folio, we couldn't discard
> * the folio unless they all do MADV_FREE so let's skip the folio.
> */
> - if (folio_estimated_sharers(folio) != 1)
> + if (folio_likely_mapped_shared(folio))
> goto out;
>
> if (!folio_trylock(folio))
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 44a498c94158c..32a534d200219 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -366,7 +366,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
> folio = pfn_folio(pmd_pfn(orig_pmd));
>
> /* Do not interfere with other mappings of this folio */
> - if (folio_estimated_sharers(folio) != 1)
> + if (folio_likely_mapped_shared(folio))
> goto huge_unlock;
>
> if (pageout_anon_only_filter && !folio_test_anon(folio))
> @@ -453,7 +453,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
> if (folio_test_large(folio)) {
> int err;
>
> - if (folio_estimated_sharers(folio) > 1)
> + if (folio_likely_mapped_shared(folio))
> break;
> if (pageout_anon_only_filter && !folio_test_anon(folio))
> break;
> @@ -677,7 +677,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
> if (folio_test_large(folio)) {
> int err;
>
> - if (folio_estimated_sharers(folio) != 1)
> + if (folio_likely_mapped_shared(folio))
> break;
> if (!folio_trylock(folio))
> break;
> diff --git a/mm/memory.c b/mm/memory.c
> index 1c45b6a42a1b9..8394a9843ca06 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5173,7 +5173,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
> * Flag if the folio is shared between multiple address spaces. This
> * is later used when determining whether to group tasks together
> */
> - if (folio_estimated_sharers(folio) > 1 && (vma->vm_flags & VM_SHARED))
> + if (folio_likely_mapped_shared(folio) && (vma->vm_flags & VM_SHARED))
> flags |= TNF_SHARED;
>
> nid = folio_nid(folio);
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index f60b4c99f1302..0b92fde395182 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -642,12 +642,11 @@ static int queue_folios_hugetlb(pte_t *pte, unsigned long hmask,
> * Unless MPOL_MF_MOVE_ALL, we try to avoid migrating a shared folio.
> * Choosing not to migrate a shared folio is not counted as a failure.
> *
> - * To check if the folio is shared, ideally we want to make sure
> - * every page is mapped to the same process. Doing that is very
> - * expensive, so check the estimated sharers of the folio instead.
> + * See folio_likely_mapped_shared() on possible imprecision when we
> + * cannot easily detect if a folio is shared.
> */
> if ((flags & MPOL_MF_MOVE_ALL) ||
> - (folio_estimated_sharers(folio) == 1 && !hugetlb_pmd_shared(pte)))
> + (!folio_likely_mapped_shared(folio) && !hugetlb_pmd_shared(pte)))
> if (!isolate_hugetlb(folio, qp->pagelist))
> qp->nr_failed++;
> unlock:
> @@ -1032,11 +1031,10 @@ static bool migrate_folio_add(struct folio *folio, struct list_head *foliolist,
> * Unless MPOL_MF_MOVE_ALL, we try to avoid migrating a shared folio.
> * Choosing not to migrate a shared folio is not counted as a failure.
> *
> - * To check if the folio is shared, ideally we want to make sure
> - * every page is mapped to the same process. Doing that is very
> - * expensive, so check the estimated sharers of the folio instead.
> + * See folio_likely_mapped_shared() on possible imprecision when we
> + * cannot easily detect if a folio is shared.
> */
> - if ((flags & MPOL_MF_MOVE_ALL) || folio_estimated_sharers(folio) == 1) {
> + if ((flags & MPOL_MF_MOVE_ALL) || !folio_likely_mapped_shared(folio)) {
> if (folio_isolate_lru(folio)) {
> list_add_tail(&folio->lru, foliolist);
> node_stat_mod_folio(folio,
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 73a052a382f13..35d376969f8b9 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2568,11 +2568,11 @@ int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma,
> /*
> * Don't migrate file folios that are mapped in multiple processes
> * with execute permissions as they are probably shared libraries.
> - * To check if the folio is shared, ideally we want to make sure
> - * every page is mapped to the same process. Doing that is very
> - * expensive, so check the estimated mapcount of the folio instead.
> + *
> + * See folio_likely_mapped_shared() on possible imprecision when we
> + * cannot easily detect if a folio is shared.
> */
> - if (folio_estimated_sharers(folio) != 1 && folio_is_file_lru(folio) &&
> + if (folio_likely_mapped_shared(folio) && folio_is_file_lru(folio) &&
> (vma->vm_flags & VM_EXEC))
> goto out;
>
> --
> 2.43.2
>