Re: [RFC PATCH] mm: bail out from psi memstall when cond_resched

From: Peter Zijlstra
Date: Thu Nov 19 2020 - 07:56:43 EST


On Wed, Nov 18, 2020 at 11:22:56AM +0800, Zhaoyang Huang wrote:
> Memory reclaiming will run as several seconds in memory constraint system, which
> will be deemed as heavy memstall. Have the memory reclaim be more presiced by
> bailing out when cond_resched

How is this supposed to work on PREEMPT=y where cond_resched() is a NOP
and you can get preempted at any random point?

(leaving the rest for Johannes)

> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx>
> ---
> mm/vmscan.c | 23 ++++++++++++++++-------
> 1 file changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a815f73..a083c85 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -316,6 +316,15 @@ static inline bool memcg_congested(struct pglist_data *pgdat,
> }
> #endif
>
> +static inline void psi_cond_resched(void)
> +{
> + unsigned long *flags;
> +
> + if (current->flags & PF_MEMSTALL)
> + psi_memstall_leave(&flags);
> + cond_resched();
> + psi_memstall_enter(&flags);
> +}
> /*
> * This misses isolated pages which are not accounted for to save counters.
> * As the data only determines if reclaim or compaction continues, it is
> @@ -557,7 +566,7 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
> total_scan -= shrinkctl->nr_scanned;
> scanned += shrinkctl->nr_scanned;
>
> - cond_resched();
> + psi_cond_resched();
> }
>
> if (next_deferred >= scanned)
> @@ -714,7 +723,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
>
> up_read(&shrinker_rwsem);
> out:
> - cond_resched();
> + psi_cond_resched();
> return freed;
> }
>
> @@ -1109,7 +1118,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> unsigned nr_reclaimed = 0;
>
> memset(stat, 0, sizeof(*stat));
> - cond_resched();
> + psi_cond_resched();
>
> while (!list_empty(page_list)) {
> struct address_space *mapping;
> @@ -1118,7 +1127,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> enum page_references references = PAGEREF_RECLAIM_CLEAN;
> bool dirty, writeback;
>
> - cond_resched();
> + psi_cond_resched();
>
> page = lru_to_page(page_list);
> list_del(&page->lru);
> @@ -2084,7 +2093,7 @@ static void shrink_active_list(unsigned long nr_to_scan,
> spin_unlock_irq(&pgdat->lru_lock);
>
> while (!list_empty(&l_hold)) {
> - cond_resched();
> + psi_cond_resched();
> page = lru_to_page(&l_hold);
> list_del(&page->lru);
>
> @@ -2500,7 +2509,7 @@ static void shrink_node_memcg(struct pglist_data *pgdat, struct mem_cgroup *memc
> }
> }
>
> - cond_resched();
> + psi_cond_resched();
>
> if (nr_reclaimed < nr_to_reclaim || scan_adjusted)
> continue;
> @@ -4149,7 +4158,7 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
> .reclaim_idx = gfp_zone(gfp_mask),
> };
>
> - cond_resched();
> + psi_cond_resched();
> fs_reclaim_acquire(sc.gfp_mask);
> /*
> * We need to be able to allocate from the reserves for RECLAIM_UNMAP
> --
> 1.9.1
>