Re: [PATCH v1] mm: convert folio_estimated_sharers() to folio_likely_mapped_shared()

From: Khalid Aziz
Date: Tue Feb 27 2024 - 18:20:54 EST


On Tue, 2024-02-27 at 21:15 +0100, David Hildenbrand wrote:
> Callers of folio_estimated_sharers() only care about "mapped shared
> vs.
> mapped exclusively", not the exact estimate of sharers. Let's
> consolidate
> and unify the condition users are checking. While at it clarify the
> semantics and extend the discussion on the fuzziness.
>
> Use the "likely mapped shared" terminology to better express what the
> (adjusted) function actually checks.
>
> Whether a partially-mappable folio is more likely to not be partially
> mapped than partially mapped is debatable. In the future, we might be
> able
> to improve our estimate for partially-mappable folios, though.
>
> Note that we will now consistently detect "mapped shared" only if the
> first subpage is actually mapped multiple times. When the first
> subpage
> is not mapped, we will consistently detect it as "mapped
> exclusively".
> This change should currently only affect the usage in
> madvise_free_pte_range() and queue_folios_pte_range() for large
> folios: if
> the first page was already unmapped, we would have skipped the folio.
>
> Cc: Barry Song <v-songbaohua@xxxxxxxx>
> Cc: Vishal Moola (Oracle) <vishal.moola@xxxxxxxxx>
> Cc: Ryan Roberts <ryan.roberts@xxxxxxx>
> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>


This patch adds clarity while retaining current behavior, so looks good
to me.

Reviewed-by: Khalid Aziz <khalid.aziz@xxxxxxxxxx>


> ---
>  include/linux/mm.h | 46 ++++++++++++++++++++++++++++++++++++--------
> --
>  mm/huge_memory.c   |  2 +-
>  mm/madvise.c       |  6 +++---
>  mm/memory.c        |  2 +-
>  mm/mempolicy.c     | 14 ++++++--------
>  mm/migrate.c       |  8 ++++----
>  6 files changed, 51 insertions(+), 27 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 6f4825d829656..795c89632265f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2147,21 +2147,47 @@ static inline size_t folio_size(struct folio
> *folio)
>  }
>  
>  /**
> - * folio_estimated_sharers - Estimate the number of sharers of a
> folio.
> + * folio_likely_mapped_shared - Estimate if the folio is mapped into
> the page
> + * tables of more than one MM
>   * @folio: The folio.
>   *
> - * folio_estimated_sharers() aims to serve as a function to
> efficiently
> - * estimate the number of processes sharing a folio. This is done by
> - * looking at the precise mapcount of the first subpage in the
> folio, and
> - * assuming the other subpages are the same. This may not be true
> for large
> - * folios. If you want exact mapcounts for exact calculations, look
> at
> - * page_mapcount() or folio_total_mapcount().
> + * This function checks if the folio is currently mapped into more
> than one
> + * MM ("mapped shared"), or if the folio is only mapped into a
> single MM
> + * ("mapped exclusively").
>   *
> - * Return: The estimated number of processes sharing a folio.
> + * As precise information is not easily available for all folios,
> this function
> + * estimates the number of MMs ("sharers") that are currently
> mapping a folio
> + * using the number of times the first page of the folio is
> currently mapped
> + * into page tables.
> + *
> + * For small anonymous folios (except KSM folios) and anonymous
> hugetlb folios,
> + * the return value will be exactly correct, because they can only
> be mapped
> + * at most once into an MM, and they cannot be partially mapped.
> + *
> + * For other folios, the result can be fuzzy:
> + * (a) For partially-mappable large folios (THP), the return value
> can wrongly
> + *     indicate "mapped exclusively" (false negative) when the folio
> is
> + *     only partially mapped into at least one MM.
> + * (b) For pagecache folios (including hugetlb), the return value
> can wrongly
> + *     indicate "mapped shared" (false positive) when two VMAs in
> the same MM
> + *     cover the same file range.
> + * (c) For (small) KSM folios, the return value can wrongly indicate
> "mapped
> + *     shared" (false negative), when the folio is mapped multiple
> times into
> + *     the same MM.
> + *
> + * Further, this function only considers current page table mappings
> that
> + * are tracked using the folio mapcount(s). It does not consider:
> + * (1) If the folio might get mapped in the (near) future (e.g.,
> swapcache,
> + *     pagecache, temporary unmapping for migration).
> + * (2) If the folio is mapped differently (VM_PFNMAP).
> + * (3) If hugetlb page table sharing applies. Callers might want to
> check
> + *     hugetlb_pmd_shared().
> + *
> + * Return: Whether the folio is estimated to be mapped into more
> than one MM.
>   */
> -static inline int folio_estimated_sharers(struct folio *folio)
> +static inline bool folio_likely_mapped_shared(struct folio *folio)
>  {
> - return page_mapcount(folio_page(folio, 0));
> + return page_mapcount(folio_page(folio, 0)) > 1;
>  }
>  
>  #ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 50d146eb248ff..4d10904fef70c 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1829,7 +1829,7 @@ bool madvise_free_huge_pmd(struct mmu_gather
> *tlb, struct vm_area_struct *vma,
>   * If other processes are mapping this folio, we couldn't
> discard
>   * the folio unless they all do MADV_FREE so let's skip the
> folio.
>   */
> - if (folio_estimated_sharers(folio) != 1)
> + if (folio_likely_mapped_shared(folio))
>   goto out;
>  
>   if (!folio_trylock(folio))
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 44a498c94158c..32a534d200219 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -366,7 +366,7 @@ static int
> madvise_cold_or_pageout_pte_range(pmd_t *pmd,
>   folio = pfn_folio(pmd_pfn(orig_pmd));
>  
>   /* Do not interfere with other mappings of this
> folio */
> - if (folio_estimated_sharers(folio) != 1)
> + if (folio_likely_mapped_shared(folio))
>   goto huge_unlock;
>  
>   if (pageout_anon_only_filter &&
> !folio_test_anon(folio))
> @@ -453,7 +453,7 @@ static int
> madvise_cold_or_pageout_pte_range(pmd_t *pmd,
>   if (folio_test_large(folio)) {
>   int err;
>  
> - if (folio_estimated_sharers(folio) > 1)
> + if (folio_likely_mapped_shared(folio))
>   break;
>   if (pageout_anon_only_filter &&
> !folio_test_anon(folio))
>   break;
> @@ -677,7 +677,7 @@ static int madvise_free_pte_range(pmd_t *pmd,
> unsigned long addr,
>   if (folio_test_large(folio)) {
>   int err;
>  
> - if (folio_estimated_sharers(folio) != 1)
> + if (folio_likely_mapped_shared(folio))
>   break;
>   if (!folio_trylock(folio))
>   break;
> diff --git a/mm/memory.c b/mm/memory.c
> index 1c45b6a42a1b9..8394a9843ca06 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5173,7 +5173,7 @@ static vm_fault_t do_numa_page(struct vm_fault
> *vmf)
>   * Flag if the folio is shared between multiple address
> spaces. This
>   * is later used when determining whether to group tasks
> together
>   */
> - if (folio_estimated_sharers(folio) > 1 && (vma->vm_flags &
> VM_SHARED))
> + if (folio_likely_mapped_shared(folio) && (vma->vm_flags &
> VM_SHARED))
>   flags |= TNF_SHARED;
>  
>   nid = folio_nid(folio);
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index f60b4c99f1302..0b92fde395182 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -642,12 +642,11 @@ static int queue_folios_hugetlb(pte_t *pte,
> unsigned long hmask,
>   * Unless MPOL_MF_MOVE_ALL, we try to avoid migrating a
> shared folio.
>   * Choosing not to migrate a shared folio is not counted as
> a failure.
>   *
> - * To check if the folio is shared, ideally we want to make
> sure
> - * every page is mapped to the same process. Doing that is
> very
> - * expensive, so check the estimated sharers of the folio
> instead.
> + * See folio_likely_mapped_shared() on possible imprecision
> when we
> + * cannot easily detect if a folio is shared.
>   */
>   if ((flags & MPOL_MF_MOVE_ALL) ||
> -     (folio_estimated_sharers(folio) == 1 &&
> !hugetlb_pmd_shared(pte)))
> +     (!folio_likely_mapped_shared(folio) &&
> !hugetlb_pmd_shared(pte)))
>   if (!isolate_hugetlb(folio, qp->pagelist))
>   qp->nr_failed++;
>  unlock:
> @@ -1032,11 +1031,10 @@ static bool migrate_folio_add(struct folio
> *folio, struct list_head *foliolist,
>   * Unless MPOL_MF_MOVE_ALL, we try to avoid migrating a
> shared folio.
>   * Choosing not to migrate a shared folio is not counted as
> a failure.
>   *
> - * To check if the folio is shared, ideally we want to make
> sure
> - * every page is mapped to the same process. Doing that is
> very
> - * expensive, so check the estimated sharers of the folio
> instead.
> + * See folio_likely_mapped_shared() on possible imprecision
> when we
> + * cannot easily detect if a folio is shared.
>   */
> - if ((flags & MPOL_MF_MOVE_ALL) ||
> folio_estimated_sharers(folio) == 1) {
> + if ((flags & MPOL_MF_MOVE_ALL) ||
> !folio_likely_mapped_shared(folio)) {
>   if (folio_isolate_lru(folio)) {
>   list_add_tail(&folio->lru, foliolist);
>   node_stat_mod_folio(folio,
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 73a052a382f13..35d376969f8b9 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2568,11 +2568,11 @@ int migrate_misplaced_folio(struct folio
> *folio, struct vm_area_struct *vma,
>   /*
>   * Don't migrate file folios that are mapped in multiple
> processes
>   * with execute permissions as they are probably shared
> libraries.
> - * To check if the folio is shared, ideally we want to make
> sure
> - * every page is mapped to the same process. Doing that is
> very
> - * expensive, so check the estimated mapcount of the folio
> instead.
> + *
> + * See folio_likely_mapped_shared() on possible imprecision
> when we
> + * cannot easily detect if a folio is shared.
>   */
> - if (folio_estimated_sharers(folio) != 1 &&
> folio_is_file_lru(folio) &&
> + if (folio_likely_mapped_shared(folio) &&
> folio_is_file_lru(folio) &&
>       (vma->vm_flags & VM_EXEC))
>   goto out;
>