Re: DAX mapping detection (was: Re: [PATCH] Fix region lost in /proc/self/smaps)

From: Xiao Guangrong
Date: Mon Sep 12 2016 - 02:36:57 EST




On 09/12/2016 11:44 AM, Rudoff, Andy wrote:
Whether msync/fsync can make data persistent depends on ADR feature on
memory controller, if it exists everything works well, otherwise, we need
to have another interface that is why 'Flush hint table' in ACPI comes
in. 'Flush hint table' is particularly useful for nvdimm virtualization if
we use normal memory to emulate nvdimm with data persistent characteristic
(the data will be flushed to a persistent storage, e.g, disk).

Does current PMEM programming model fully supports 'Flush hint table'? Is
userspace allowed to use these addresses?

The Flush hint table is NOT a replacement for ADR. To support pmem on
the x86 architecture, the platform is required to ensure that a pmem
store flushed from the CPU caches is in the persistent domain so that the
application need not take any additional steps to make it persistent.
The most common way to do this is the ADR feature.

If the above is not true, then your x86 platform does not support pmem.

Understood.

However, virtualization is a special case as we can use normal memory
to emulate NVDIMM for the vm so that vm can bypass local file-cache,
reduce memory usage and io path, etc. Currently, this usage is useful
for lightweight virtualization, such as clean container.

Under this case, ADR is available on physical platform but it can
not help us to make data persistence for the vm. So that virtualizeing
'flush hint table' is a good way to handle it based on the acpi spec:
| software can write to any one of these Flush Hint Addresses to
| cause any preceding writes to the NVDIMM region to be flushed
| out of the intervening platform buffers 1 to the targeted NVDIMM
| (to achieve durability)


Flush hints are for use by the BIOS and drivers and are not intended to
be used in user space. Flush hints provide two things:

First, if a driver needs to write to command registers or movable windows
on a DIMM, the Flush hint (if provided in the NFIT) is required to flush
the command to the DIMM or ensure stores done through the movable window
are complete before moving it somewhere else.

Second, for the rare case where the kernel wants to flush stores to the
smallest possible failure domain (i.e. to the DIMM even though ADR will
handle flushing it from a larger domain), the flush hints provide a way
to do this. This might be useful for things like file system journals to
help ensure the file system is consistent even in the face of ADR failure.

We are assuming ADR can fail, however, do we have a way to know whether
ADR works correctly? Maybe MCE can work on it?

This is necessary to support making data persistent without 'fsync/msync'
in userspace. Or do we need to unconditionally use 'flush hint address'
if it is available as current nvdimm driver does?

Thanks!