On 7/15/19 3:00 PM, Andrew Morton wrote:
On Tue, 9 Jul 2019 18:24:57 -0700 Ralph Campbell <rcampbell@xxxxxxxxxx> wrote:
On 7/9/19 5:28 PM, Andrew Morton wrote:
On Tue, 9 Jul 2019 15:35:56 -0700 Ralph Campbell <rcampbell@xxxxxxxxxx> wrote:
When migrating a ZONE device private page from device memory to system
memory, the subpage pointer is initialized from a swap pte which computes
an invalid page pointer. A kernel panic results such as:
BUG: unable to handle page fault for address: ffffea1fffffffc8
Initialize subpage correctly before calling page_remove_rmap().
I think this is
Fixes: a5430dda8a3a1c ("mm/migrate: support un-addressable ZONE_DEVICE page in migration")
Cc: stable
yes?
Yes. Can you add this or should I send a v2?
I updated the patch. Could we please have some review input?
From: Ralph Campbell <rcampbell@xxxxxxxxxx>
Subject: mm/hmm: fix bad subpage pointer in try_to_unmap_one
When migrating a ZONE device private page from device memory to system
memory, the subpage pointer is initialized from a swap pte which computes
an invalid page pointer. A kernel panic results such as:
BUG: unable to handle page fault for address: ffffea1fffffffc8
Initialize subpage correctly before calling page_remove_rmap().
Link: http://lkml.kernel.org/r/20190709223556.28908-1-rcampbell@xxxxxxxxxx
Fixes: a5430dda8a3a1c ("mm/migrate: support un-addressable ZONE_DEVICE page in migration")
Signed-off-by: Ralph Campbell <rcampbell@xxxxxxxxxx>
Cc: "JÃrÃme Glisse" <jglisse@xxxxxxxxxx>
Cc: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx>
Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
Cc: Jason Gunthorpe <jgg@xxxxxxxxxxxx>
Cc: <stable@xxxxxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---
mm/rmap.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/rmap.c~mm-hmm-fix-bad-subpage-pointer-in-try_to_unmap_one
+++ a/mm/rmap.c
@@ -1476,6 +1476,7 @@ static bool try_to_unmap_one(struct page
* No need to invalidate here it will synchronize on
* against the special swap migration pte.
*/
+ subpage = page;
goto discard;
}
Hi Ralph and everyone,
While the above prevents a crash, I'm concerned that it is still not
an accurate fix. This fix leads to repeatedly removing the rmap, against the
same struct page, which is odd, and also doesn't directly address the
root cause, which I understand to be: this routine can't handle migrating
the zero page properly--over and back, anyway. (We should also mention more
about how this is triggered, in the commit description.)
I'll take a closer look at possible fixes (I have to step out for a bit) soon,
but any more experienced help is also appreciated here.
thanks,