Kernel panic due to page migration accessing memory holes
From: Michael Bohan
Date: Wed Feb 17 2010 - 19:46:11 EST
Hi,
I have encountered a kernel panic on the ARM/msm platform in the mm
migration code on 2.6.29. My memory configuration has two discontiguous
banks per our ATAG definition. These banks end up on addresses that
are 1 MB aligned. I am using FLATMEM (not SPARSEMEM), but my
understanding is that SPARSEMEM should not be necessary to support this
configuration. Please correct me if I'm wrong.
The crash occurs in mm/page_alloc.c:move_freepages() when being passed a
start_page that corresponds to the last several megabytes of our first
memory bank. The code in move_freepages_block() aligns the passed in
page number to pageblock_nr_pages, which corresponds to 4 MB. It then
passes that aligned pfn as the beginning of a 4 MB range to
move_freepages(). The problem is that since our bank's end address is
not 4 MB aligned, the range passed to move_freepages() exceeds the end
of our memory bank. The code later blows up when trying to access
uninitialized page structures.
As a temporary fix, I added some code to move_freepages_block() that
inspects whether the range exceeds our first memory bank -- returning 0
if it does. This is not a clean solution, since it requires exporting
the ARM specific meminfo structure to extract the bank information.
I see an option exists called CONFIG_HOLES_IN_ZONE, which has control
over the definition of pfn_valid_within() used in move_freepages().
This option seems relevant to the problem. The ia64 architecture has a
special version of pfn_valid() called ia64_pfn_valid() that is used in
conjunction with this option. It appears to inspect the page
structure's state in a safe way that does not cause a crash, and can
presumably be used to determine whether the page structure is
initialized properly. The ARM version of pfn_valid() used in the
FLATMEM scenario does not appear to be memory hole aware, and will
blindly return true in this case.
I have looked on linux-next, and at least the functions mentioned above
have not changed.
I was curious if there is a stated requirement where memory banks must
end on 4 MB aligned addresses. Although I found this problem on ARM, it
appears upon inspection that the problem could occur on other
architectures as well, given the memory map assumptions stated above.
I'm hoping that some mm experts might understand the problem in greater
detail.
Thanks,
Michael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/