On Fri, 2022-05-27 at 17:55 +0530, Aneesh Kumar K.V wrote:
From: Jagdish Gediya <jvgediya@xxxxxxxxxxxxx>
currently, a higher tier node can only be demoted to selected
nodes on the next lower tier as defined by the demotion path,
not any other node from any lower tier. This strict, hard-coded
demotion order does not work in all use cases (e.g. some use cases
may want to allow cross-socket demotion to another node in the same
demotion tier as a fallback when the preferred demotion node is out
of space). This demotion order is also inconsistent with the page
allocation fallback order when all the nodes in a higher tier are
out of space: The page allocation can fall back to any node from any
lower tier, whereas the demotion order doesn't allow that currently.
This patch adds support to get all the allowed demotion targets mask
for node, also demote_page_list() function is modified to utilize this
allowed node mask by filling it in migration_target_control structure
before passing it to migrate_pages().
* Take pages on @demote_list and attempt to demote them to
* another node. Pages which are not demoted are left on
@@ -1481,6 +1464,19 @@ static unsigned int demote_page_list(struct list_head *demote_pages,
{
int target_nid = next_demotion_node(pgdat->node_id);
unsigned int nr_succeeded;
+ nodemask_t allowed_mask;
+
+ struct migration_target_control mtc = {
+ /*
+ * Allocate from 'node', or fail quickly and quietly.
+ * When this happens, 'page' will likely just be discarded
+ * instead of migrated.
+ */
+ .gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) | __GFP_NOWARN |
+ __GFP_NOMEMALLOC | GFP_NOWAIT,
+ .nid = target_nid,
+ .nmask = &allowed_mask
+ };
IMHO, we should try to allocate from preferred node firstly (which will
kick kswapd of the preferred node if necessary). If failed, we will
fallback to all allowed node.
As we discussed as follows,
https://lore.kernel.org/lkml/69f2d063a15f8c4afb4688af7b7890f32af55391.camel@xxxxxxxxx/
That is, something like below,
static struct page *alloc_demote_page(struct page *page, unsigned long node)
{
struct page *page;
nodemask_t allowed_mask;
struct migration_target_control mtc = {
/*
* Allocate from 'node', or fail quickly and quietly.
* When this happens, 'page' will likely just be discarded
* instead of migrated.
*/
.gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) |
__GFP_THISNODE | __GFP_NOWARN |
__GFP_NOMEMALLOC | GFP_NOWAIT,
.nid = node
};
page = alloc_migration_target(page, (unsigned long)&mtc);
if (page)
return page;
mtc.gfp_mask &= ~__GFP_THISNODE;
mtc.nmask = &allowed_mask;
return alloc_migration_target(page, (unsigned long)&mtc);
}