Re: [PATCH 1/2] mm, page_alloc: Keep pcp count and list contents in sync if struct page is corrupted

From: Michal Hocko
Date: Fri Dec 02 2016 - 05:03:33 EST


On Fri 02-12-16 09:49:33, Mel Gorman wrote:
> On Fri, Dec 02, 2016 at 09:12:17AM +0100, Michal Hocko wrote:
> > On Fri 02-12-16 00:22:43, Mel Gorman wrote:
> > > Vlastimil Babka pointed out that commit 479f854a207c ("mm, page_alloc:
> > > defer debugging checks of pages allocated from the PCP") will allow the
> > > per-cpu list counter to be out of sync with the per-cpu list contents
> > > if a struct page is corrupted. This patch keeps the accounting in sync.
> > >
> > > Fixes: 479f854a207c ("mm, page_alloc: defer debugging checks of pages allocated from the PCP")
> > > Signed-off-by: Mel Gorman <mgorman@xxxxxxx>
> > > cc: stable@xxxxxxxxxxxxxxx [4.7+]
> >
> > I am trying to think about what would happen if we did go out of sync
> > and cannot spot a problem. Vlastimil has mentioned something about
> > free_pcppages_bulk looping for ever but I cannot see it happening right
> > now.
>
> free_pcppages_bulk can infinite loop if the page count is positive and
> there are no pages. While I've only seen this during development, a
> corrupted count loops here
>
> do {
> batch_free++;
> if (++pindex == NR_PCP_LISTS)
> pindex = 0;
> list = &pcp->lists[pindex];
> } while (list_empty(list));
>
> It would only be seen in a situation where struct page corruption was
> detected so it's rare.

OK, I was apparently sleeping when responding. I focused on t he outer
loop and that should just converge. But it is true that this inner loop
can just runaway... Could you add that to the changelog please? This
definitely warrants stable backport.

Thanks!

--
Michal Hocko
SUSE Labs