Re: [PATCH] mm/vmstat: Reduce zone lock hold time when reading /proc/pagetypeinfo

From: Mel Gorman
Date: Wed Oct 23 2019 - 04:31:51 EST


On Tue, Oct 22, 2019 at 06:57:45PM +0200, Michal Hocko wrote:
> [Cc Mel]
>
> On Tue 22-10-19 12:21:56, Waiman Long wrote:
> > The pagetypeinfo_showfree_print() function prints out the number of
> > free blocks for each of the page orders and migrate types. The current
> > code just iterates the each of the free lists to get counts. There are
> > bug reports about hard lockup panics when reading the /proc/pagetyeinfo
> > file just because it look too long to iterate all the free lists within
> > a zone while holing the zone lock with irq disabled.
> >
> > Given the fact that /proc/pagetypeinfo is readable by all, the possiblity
> > of crashing a system by the simple act of reading /proc/pagetypeinfo
> > by any user is a security problem that needs to be addressed.
>
> Should we make the file 0400? It is a useful thing when debugging but
> not something regular users would really need for life.
>

I think this would be useful in general. The information is not that
useful outside of debugging. Even then it's only useful when trying to
get a handle on why a path like compaction is taking too long.

> > There is a free_area structure associated with each page order. There
> > is also a nr_free count within the free_area for all the different
> > migration types combined. Tracking the number of free list entries
> > for each migration type will probably add some overhead to the fast
> > paths like moving pages from one migration type to another which may
> > not be desirable.
>
> Have you tried to measure that overhead?
>

I would prefer this option not be taken. It would increase the cost of
watermark calculations which is a relatively fast path.

> > we can actually skip iterating the list of one of the migration types
> > and used nr_free to compute the missing count. Since MIGRATE_MOVABLE
> > is usually the largest one on large memory systems, this is the one
> > to be skipped. Since the printing order is migration-type => order, we
> > will have to store the counts in an internal 2D array before printing
> > them out.
> >
> > Even by skipping the MIGRATE_MOVABLE pages, we may still be holding the
> > zone lock for too long blocking out other zone lock waiters from being
> > run. This can be problematic for systems with large amount of memory.
> > So a check is added to temporarily release the lock and reschedule if
> > more than 64k of list entries have been iterated for each order. With
> > a MAX_ORDER of 11, the worst case will be iterating about 700k of list
> > entries before releasing the lock.
>
> But you are still iterating through the whole free_list at once so if it
> gets really large then this is still possible. I think it would be
> preferable to use per migratetype nr_free if it doesn't cause any
> regressions.
>

I think it will. The patch as it is contains the overhead within the
reader of the pagetypeinfo proc file which is a non-critical path. The
page allocator paths on the other hand is very important.

--
Mel Gorman
SUSE Labs