On Wed 27-03-19 11:59:28, Yang Shi wrote:
[...]
On 3/27/19 10:34 AM, Dan Williams wrote:
On Wed, Mar 27, 2019 at 2:01 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
On Tue 26-03-19 19:58:56, Yang Shi wrote:
One rule of thumb is: Do not design user visible interfaces based on theIn ideal case, yes, I agree. However, in real life world the performance isAgree. It's just another NUMA node and shouldn't be special cased.It is still NUMA, users still can see all the NUMA nodes.No, Linux NUMA implementation makes all numa nodes available by default
and provides an API to opt-in for more fine tuning. What you are
suggesting goes against that semantic and I am asking why. How is pmem
NUMA node any different from any any other distant node in principle?
Userspace policy can choose to avoid it, but typical node distance
preference should otherwise let the kernel fall back to it as
additional memory pressure relief for "near" memory.
a concern. It is well-known that PMEM (not considering NVDIMM-F or HBM) has
higher latency and lower bandwidth. We observed much higher latency on PMEM
than DRAM with multi threads.
contemporary technology and its up/down sides. This will almost always
fire back.
Btw. if you keep arguing about performance without any numbers. Can you
present something specific?
In real production environment we don't know what kind of applications wouldwe have cpuset cgroup controller to help here.
end up on PMEM (DRAM may be full, allocation fall back to PMEM) then have
unexpected performance degradation. I understand to have mempolicy to choose
to avoid it. But, there might be hundreds or thousands of applications
running on the machine, it sounds not that feasible to me to have each
single application set mempolicy to avoid it.
So, I think we still need a default allocation node mask. The default valueIf the performance sucks that badly then do not use the pmem as NUMA,
may include all nodes or just DRAM nodes. But, they should be able to be
override by user globally, not only per process basis.
Due to the performance disparity, currently our usecases treat PMEM as
second tier memory for demoting cold page or binding to not memory access
sensitive applications (this is the reason for inventing a new mempolicy)
although it is a NUMA node.
really. There are certainly other ways to export the pmem storage. Use
it as a fast swap storage. Or try to work on a swap caching mechanism
that still allows much faster access than a slow swap storage. But do
not try to pretend to abuse the NUMA interface while you are breaking
some of its long term established semantics.