Re: [v2 RFC PATCH 0/9] Another Approach to Use PMEM as NUMA Node

From: Michal Hocko
Date: Wed Apr 17 2019 - 13:51:56 EST


On Wed 17-04-19 10:26:05, Yang Shi wrote:
>
>
> On 4/17/19 9:39 AM, Michal Hocko wrote:
> > On Wed 17-04-19 09:37:39, Keith Busch wrote:
> > > On Wed, Apr 17, 2019 at 05:39:23PM +0200, Michal Hocko wrote:
> > > > On Wed 17-04-19 09:23:46, Keith Busch wrote:
> > > > > On Wed, Apr 17, 2019 at 11:23:18AM +0200, Michal Hocko wrote:
> > > > > > On Tue 16-04-19 14:22:33, Dave Hansen wrote:
> > > > > > > Keith Busch had a set of patches to let you specify the demotion order
> > > > > > > via sysfs for fun. The rules we came up with were:
> > > > > > I am not a fan of any sysfs "fun"
> > > > > I'm hung up on the user facing interface, but there should be some way a
> > > > > user decides if a memory node is or is not a migrate target, right?
> > > > Why? Or to put it differently, why do we have to start with a user
> > > > interface at this stage when we actually barely have any real usecases
> > > > out there?
> > > The use case is an alternative to swap, right? The user has to decide
> > > which storage is the swap target, so operating in the same spirit.
> > I do not follow. If you use rebalancing you can still deplete the memory
> > and end up in a swap storage. If you want to reclaim/swap rather than
> > rebalance then you do not enable rebalancing (by node_reclaim or similar
> > mechanism).
>
> I'm a little bit confused. Do you mean just do *not* do reclaim/swap in
> rebalancing mode? If rebalancing is on, then node_reclaim just move the
> pages around nodes, then kswapd or direct reclaim would take care of swap?

Yes, that was the idea I wanted to get through. Sorry if that was not
really clear.

> If so the node reclaim on PMEM node may rebalance the pages to DRAM node?
> Should this be allowed?

Why it shouldn't? If there are other vacant Nodes to absorb that memory
then why not use it?

> I think both I and Keith was supposed to treat PMEM as a tier in the reclaim
> hierarchy. The reclaim should push inactive pages down to PMEM, then swap.
> So, PMEM is kind of a "terminal" node. So, he introduced sysfs defined
> target node, I introduced N_CPU_MEM.

I understand that. And I am trying to figure out whether we really have
to tream PMEM specially here. Why is it any better than a generic NUMA
rebalancing code that could be used for many other usecases which are
not PMEM specific. If you present PMEM as a regular memory then also use
it as a normal memory.
--
Michal Hocko
SUSE Labs