On Wed, Mar 27, 2019 at 2:01 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
On Tue 26-03-19 19:58:56, Yang Shi wrote:Agree. It's just another NUMA node and shouldn't be special cased.
No, Linux NUMA implementation makes all numa nodes available by default
On 3/26/19 11:37 AM, Michal Hocko wrote:
On Tue 26-03-19 11:33:17, Yang Shi wrote:It is still NUMA, users still can see all the NUMA nodes.
On 3/26/19 6:58 AM, Michal Hocko wrote:Again, how is this different from NUMA in general?
On Sat 23-03-19 12:44:25, Yang Shi wrote:The major rationale behind this is we assume the most applications should be
With Dave Hansen's patches merged into Linus's treeWhy are you pushing yourself into the corner right at the beginning? If
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c221c0b0308fd01d9fb33a16f64d2fd95f8830a4
PMEM could be hot plugged as NUMA node now. But, how to use PMEM as NUMA node
effectively and efficiently is still a question.
There have been a couple of proposals posted on the mailing list [1] [2].
The patchset is aimed to try a different approach from this proposal [1]
to use PMEM as NUMA nodes.
The approach is designed to follow the below principles:
1. Use PMEM as normal NUMA node, no special gfp flag, zone, zonelist, etc.
2. DRAM first/by default. No surprise to existing applications and default
running. PMEM will not be allocated unless its node is specified explicitly
by NUMA policy. Some applications may be not very sensitive to memory latency,
so they could be placed on PMEM nodes then have hot pages promote to DRAM
gradually.
the PMEM is exported as a regular NUMA node then the only difference
should be performance characteristics (module durability which shouldn't
play any role in this particular case, right?). Applications which are
already sensitive to memory access should better use proper binding already.
Some NUMA topologies might have quite a large interconnect penalties
already. So this doesn't sound like an argument to me, TBH.
sensitive to memory access, particularly for meeting the SLA. The
applications run on the machine may be agnostic to us, they may be sensitive
or non-sensitive. But, assuming they are sensitive to memory access sounds
safer from SLA point of view. Then the "cold" pages could be demoted to PMEM
nodes by kernel's memory reclaim or other tools without impairing the SLA.
If the applications are not sensitive to memory access, they could be bound
to PMEM or allowed to use PMEM (nice to have allocation on DRAM) explicitly,
then the "hot" pages could be promoted to DRAM.
and provides an API to opt-in for more fine tuning. What you are
suggesting goes against that semantic and I am asking why. How is pmem
NUMA node any different from any any other distant node in principle?
Userspace policy can choose to avoid it, but typical node distance
preference should otherwise let the kernel fall back to it as
additional memory pressure relief for "near" memory.