RE: [HMM 12/15] mm/migrate: new memory migration helper for use with device memory v4

From: Evgeny Baskakov
Date: Mon Jun 26 2017 - 20:07:56 EST


On Monday, May 22, 2017 9:52 AM, JÃrÃme Glisse wrote:
[...]

+ * The alloc_and_copy() callback happens once all source pages have
+been locked,
+ * unmapped and checked (checked whether pinned or not). All pages that
+can be
+ * migrated will have an entry in the src array set with the pfn value
+of the
+ * page and with the MIGRATE_PFN_VALID and MIGRATE_PFN_MIGRATE flag set
+(other
+ * flags might be set but should be ignored by the callback).
+ *
+ * The alloc_and_copy() callback can then allocate destination memory
+and copy
+ * source memory to it for all those entries (ie with MIGRATE_PFN_VALID
+and
+ * MIGRATE_PFN_MIGRATE flag set). Once these are allocated and copied,
+the
+ * callback must update each corresponding entry in the dst array with
+the pfn
+ * value of the destination page and with the MIGRATE_PFN_VALID and
+ * MIGRATE_PFN_LOCKED flags set (destination pages must have their
+struct pages
+ * locked, via lock_page()).
+ *
+ * At this point the alloc_and_copy() callback is done and returns.
+ *
+ * Note that the callback does not have to migrate all the pages that
+are
+ * marked with MIGRATE_PFN_MIGRATE flag in src array unless this is a
+migration
+ * from device memory to system memory (ie the MIGRATE_PFN_DEVICE flag
+is also
+ * set in the src array entry). If the device driver cannot migrate a
+device
+ * page back to system memory, then it must set the corresponding dst
+array
+ * entry to MIGRATE_PFN_ERROR. This will trigger a SIGBUS if CPU tries
+to
+ * access any of the virtual addresses originally backed by this page.
+Because
+ * a SIGBUS is such a severe result for the userspace process, the
+device
+ * driver should avoid setting MIGRATE_PFN_ERROR unless it is really in
+an
+ * unrecoverable state.
+ *
+ * THE alloc_and_copy() CALLBACK MUST NOT CHANGE ANY OF THE SRC ARRAY
+ENTRIES
+ * OR BAD THINGS WILL HAPPEN !
+ *

Hi Jerome,

The documentation shown above doesn't tell what the alloc_and_copy callback should do for source pages that have not been allocated yet. Instead, it unconditionally suggests checking if the MIGRATE_PFN_VALID and MIGRATE_PFN_MIGRATE flags are set.

Based on my testing and looking in the source code, I see that for such pages the respective 'src' PFN entries are always set to 0 without any flags.

The sample driver specifically handles that by checking if there's no page in the 'src' entry, and ignores any flags in such case:

struct page *spage = migrate_pfn_to_page(*src_pfns);
...
if (spage && !(*src_pfns & MIGRATE_PFN_MIGRATE))
continue;

if (spage && (*src_pfns & MIGRATE_PFN_DEVICE)) {

I would like to suggest reflecting that in the documentation. Or, which would be more logical, migrate_vma could keep the zero in the PFN entries for not allocated pages, but set the MIGRATE_PFN_MIGRATE flag anyway.

Thanks!

Evgeny Baskakov
NVIDIA