Re: [PATCH 0/4] [RFC] Migrate Pages in lieu of discard

From: Yang Shi
Date: Fri Oct 18 2019 - 17:39:50 EST


On Fri, Oct 18, 2019 at 7:54 AM Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>
> On 10/18/19 12:44 AM, Michal Hocko wrote:
> > How does this compare to
> > http://lkml.kernel.org/r/1560468577-101178-1-git-send-email-yang.shi@xxxxxxxxxxxxxxxxx
>
> It's a _bit_ more tied to persistent memory and it appears a bit more
> tied to two tiers rather something arbitrarily deep. They're pretty
> similar conceptually although there are quite a few differences.

My patches do assume two tiers for now but it is not hard to extend to
multiple tiers. Since it is a RFC so I didn't make it that
complicated.

However, IMHO I really don't think supporting multiple tiers by making
the migration path configurable to admins or users is a good choice.
Memory migration caused by compaction or reclaim (not via syscall)
should be transparent to the users, it is the kernel internal
activity. It shouldn't be exposed to the end users.

I prefer firmware or OS build the migration path personally.

>
> For instance, what I posted has a static mapping for the migration path.
> If node A is in reclaim, we always try to allocate pages on node B.
> There are no restrictions on what those nodes can be. In Yang Shi's
> apporach, there's a dynamic search for a target migration node on each
> migration that follows the normal alloc fallback path. This ends up
> making migration nodes special.

The reason that I didn't pursue static mapping is that the node might
be offlined or onlined, so you have to keep the mapping right every
time the node state is changed. Dynamic search just returns the
closest migration target node no matter what the topology is. It
should be not time consuming.

Actually, my patches don't restrict the migration target node has to
be PMEM, it could be any memory lower than DRAM, but it just happens
PMEM is the only available media. My patch's commit log explains this
point. Again I really prefer the firmware or HMAT or ACPI driver could
build the migration path in kernel.

In addition, DRAM node is definitely excluded from migration target
since I don't think doing such migration between DRAM nodes is a good
idea in general.

>
> There are also some different choices that are pretty arbitrary. For
> instance, when you allocation a migration target page, should you cause
> memory pressure on the target?

Yes, those are definitely arbitrary. We do need sort of a lot of
details in the future by figuring out how real life workload work.

>
> To be honest, though, I don't see anything fatally flawed with it. It's
> probably a useful exercise to factor out the common bits from the two
> sets and see what we can agree on being absolutely necessary.

Sure, that definitely would help us move forward.

>