Re: 2.6.15-rc1-mm2 -- Bad page state at free_hot_cold_page (in process 'aplay', page c18eef30)
From: Takashi Iwai
Date: Mon Nov 21 2005 - 11:16:52 EST
At Mon, 21 Nov 2005 15:46:50 +0000 (GMT),
Hugh Dickins wrote:
>
> On Mon, 21 Nov 2005, Takashi Iwai wrote:
> >
> > Sorry, I still don't figure out how __GFP_COMP solved this problem.
> > Could you enlighten me a bit?
>
> The sequence of problems was this.
>
> Nick's core PageReserved removal patch in 2.6.15-rc1 (and -rc2)
> changed VM_RESERVED vmas never to free their pages on unmapping (e.g.
> on exit) - fine for remap_pfn_range areas, but a leak where others set
> VM_RESERVED; and PageReserved not to inhibit decrementing page count.
>
> In -rc1-mm2 I tried to fix that leak by restoring VM_RESERVED to its
> previous behaviour, and using a different flag VM_UNPAGED, set in
> remap_pfn_range, for the don't-free-when-unmapping behaviour.
>
> But there's then a problem when the underlying page was allocated
> as a high-order page, but the separate individual 0-order constituent
> pages are mapped into userspace by nopage: the page count of the first
> 0-order is raised by allocation, but the following 0-order pages are
> left with page count 0. nopage's get_page raises that to 1,
> zap_pte_range (or whatever it uses to actually do the freeing) lowers
> that to 0, and hence frees the page, even though it's a constituent of
> the not-yet-freed high-order page. (This had not been a problem while
> PageReserved was inhibiting decrementing the page count.)
>
> So another of my patches in -rc1-mm2 made the PageCompound technique
> available always, no longer under #ifdef CONFIG_HUGETLB_PAGE: so that
> get_page and put_page on the later constituents of the high-order
> page get redirected to the first one, and it should work okay again.
>
> Except that I'd missed that you actually have to choose to have your
> high-order pages supplied as compound pages, by passing __GFP_COMP.
> Since I wasn't passing that, they still weren't allocated as compound
> pages, so were still being freed too soon - and the PG_reserved flag
> found while freeing gave rise to the "Bad page state" messages seen.
I see, thanks for explanation!
Now another question arises: Which is the recommended method for
mmapping RAM pages, vma nopage callback or remap_pfn_range()?
IIRC, in the ealier versions, the former was recommended because
remap_page_range() with page-reserve was regarded as a hack.
But, looking through these changes, I feel that remap_pfn_range() is
better (easier and stabler) than vma nopage...
> > Isn't it needed for dma_alloc_coherent() (for i386, particularly),
> > too? dma_alloc_coherent() also gets pages with __get_free_pages().
>
> Didn't I deal with that by adding __GFP_COMP in snd_malloc_dev_pages?
Oh yes, I overlooked it. It must be fine.
> And (in a separate patch run past davem and wli first, to be aggregated
> with the sound/core/memalloc patch when I sign off and send to Andrew)
> in the sparc and sparc64 sbus_alloc_consistent.
>
> It's only an issue when the high-order page might be mapped into
> userspace, then its constituents freed up by zap_pte_range; or
> locked down with get_user_pages and later released: when constituents
> of a high-order page pass through common code designed for 0-order pages.
>
> > Also, I think we can remove all Set/ClearPageReserved() in memalloc.c
> > now. It was there just for mmap...
>
> That is so, but we'd prefer you to hold off for now. The way it's
> proceeding is, for 2.6.15 we're not actually removing the Set/Clear
> PageReserved from any of the drivers or from any of the architecture
> initialization; but PageReserved is no longer serving any functional
> purpose, except where PG_reserved acts as a double-check when the page
> is freed, as to whether it's all working right. Which was useful in
> this case, to identify where I'd forgotten all about __GFP_COMP; and
> I fear may prove useful in some other cases too. Retaining this use
> of PG_reserved for now, gives greater confidence in our safety when we
> later advance to removing all the Set/ClearPageReserved hocus-pocus.
OK. Let's fix them right later.
Takashi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/