Re: [RFC PATCH 01/31] mm: migrate: Add exchange_pages to exchange two lists of pages.

From: Zi Yan
Date: Mon Feb 18 2019 - 12:31:31 EST


On 17 Feb 2019, at 3:29, Matthew Wilcox wrote:

On Fri, Feb 15, 2019 at 02:08:26PM -0800, Zi Yan wrote:
+struct page_flags {
+ unsigned int page_error :1;
+ unsigned int page_referenced:1;
+ unsigned int page_uptodate:1;
+ unsigned int page_active:1;
+ unsigned int page_unevictable:1;
+ unsigned int page_checked:1;
+ unsigned int page_mappedtodisk:1;
+ unsigned int page_dirty:1;
+ unsigned int page_is_young:1;
+ unsigned int page_is_idle:1;
+ unsigned int page_swapcache:1;
+ unsigned int page_writeback:1;
+ unsigned int page_private:1;
+ unsigned int __pad:3;
+};

I'm not sure how to feel about this. It's a bit fragile versus somebody adding
new page flags. I don't know whether it's needed or whether you can just
copy page->flags directly because you're holding PageLock.

I agree with you that current way of copying page flags individually could miss
new page flags. I will try to come up with something better. Copying page->flags as a whole
might not simply work, since the upper part of page->flags has the page node information,
which should not be changed. I think I need to add a helper function to just copy/exchange
all page flags, like calling migrate_page_stats() twice.

+static void exchange_page(char *to, char *from)
+{
+ u64 tmp;
+ int i;
+
+ for (i = 0; i < PAGE_SIZE; i += sizeof(tmp)) {
+ tmp = *((u64 *)(from + i));
+ *((u64 *)(from + i)) = *((u64 *)(to + i));
+ *((u64 *)(to + i)) = tmp;
+ }
+}

I have a suspicion you'd be better off allocating a temporary page and
using copy_page(). Some architectures have put a lot of effort into
making copy_page() run faster.

When I am doing exchange_pages() between two NUMA nodes on a x86_64 machine,
I actually can saturate the QPI bandwidth with this operation. I think cache
prefetching was doing its job.

The purpose of proposing exchange_pages() is to avoid allocating any new page,
so that we would not trigger any potential page reclaim or memory compaction.
Allocating a temporary page defeats the purpose.


+ xa_lock_irq(&to_mapping->i_pages);
+
+ to_pslot = radix_tree_lookup_slot(&to_mapping->i_pages,
+ page_index(to_page));

This needs to be converted to the XArray. radix_tree_lookup_slot() is
going away soon. You probably need:

XA_STATE(to_xas, &to_mapping->i_pages, page_index(to_page));

Thank you for pointing this out. I will do the change.


This is a lot of code and I'm still trying to get my head aroud it all.
Thanks for putting in this work; it's good to see this approach being
explored.

Thank you for taking a look at the code.

--
Best Regards,
Yan Zi