Re: [PATCH] mm/vmstat: Reduce zone lock hold time when reading /proc/pagetypeinfo
From: David Rientjes
Date: Tue Oct 22 2019 - 20:52:11 EST
On Tue, 22 Oct 2019, Waiman Long wrote:
> >>> and used nr_free to compute the missing count. Since MIGRATE_MOVABLE
> >>> is usually the largest one on large memory systems, this is the one
> >>> to be skipped. Since the printing order is migration-type => order, we
> >>> will have to store the counts in an internal 2D array before printing
> >>> them out.
> >>>
> >>> Even by skipping the MIGRATE_MOVABLE pages, we may still be holding the
> >>> zone lock for too long blocking out other zone lock waiters from being
> >>> run. This can be problematic for systems with large amount of memory.
> >>> So a check is added to temporarily release the lock and reschedule if
> >>> more than 64k of list entries have been iterated for each order. With
> >>> a MAX_ORDER of 11, the worst case will be iterating about 700k of list
> >>> entries before releasing the lock.
> >> But you are still iterating through the whole free_list at once so if it
> >> gets really large then this is still possible. I think it would be
> >> preferable to use per migratetype nr_free if it doesn't cause any
> >> regressions.
> >>
> > Yes, it is still theoretically possible. I will take a further look at
> > having per-migrate type nr_free. BTW, there is one more place where the
> > free lists are being iterated with zone lock held - mark_free_pages().
>
> Looking deeper into the code, the exact migration type is not stored in
> the page itself. An initial movable page can be stolen to be put into
> another migration type. So in a delete or move from free_area, we don't
> know exactly what migration type the page is coming from. IOW, it is
> hard to get accurate counts of the number of entries in each lists.
>
I think the suggestion is to maintain a nr_free count of the free_list for
each order for each migratetype so anytime a page is added or deleted from
the list, the nr_free is adjusted. Then the free_area's nr_free becomes
the sum of its migratetype's nr_free at that order. That's possible to do
if you track the migratetype per page, as you said, or like pcp pages
track it as part of page->index. It's a trade-off on whether you want to
impact the performance of maintaining these new nr_frees anytime you
manipulate the freelists.
I think Vlastimil and I discussed per order per migratetype nr_frees in
the past and it could be a worthwhile improvement for other reasons,
specifically it leads to heuristics that can be used to determine how
fragmentated a certain migratetype is for a zone, i.e. a very quick way to
determine what ratio of pages over all MIGRATE_UNMOVABLE pageblocks are
free.
Or maybe there are other reasons why these nr_frees can't be maintained
anymore? (I had a patch to do it on 4.3.)
You may also find systems where MIGRATE_MOVABLE is not actually the
longest free_list compared to other migratetypes on a severely fragmented
system, so special casing MIGRATE_MOVABLE might not be the best way
forward.