Re: [PATCH v4 12/16] libnvdimm, nfit: enable support for volatile ranges

From: Linda Knippers
Date: Thu Jun 29 2017 - 18:13:24 EST




On 6/29/2017 5:50 PM, Dan Williams wrote:
On Thu, Jun 29, 2017 at 2:16 PM, Linda Knippers <linda.knippers@xxxxxxx> wrote:
On 06/29/2017 04:42 PM, Dan Williams wrote:
On Thu, Jun 29, 2017 at 12:20 PM, Linda Knippers <linda.knippers@xxxxxxx> wrote:
On 06/29/2017 01:54 PM, Dan Williams wrote:
Allow volatile nfit ranges to participate in all the same infrastructure
provided for persistent memory regions.

This seems to be a bit more than "other rework".

It's part of the rationale for having a "write_cache" control
attribute. There's only so much I can squeeze into the subject line,
but it is mentioned in the cover letter.

A resulting resulting namespace
device will still be called "pmem", but the parent region type will be
"nd_volatile".

What does this look like to a user or admin? How does someone know that
/dev/pmemX is persistent memory and /dev/pmemY isn't? Someone shouldn't
have to weed through /sys or ndctl some other interface to figure that out
in the future if they don't have to do that today. We have different
names for BTT namespaces. Is there a different name for volatile ranges?

No, the block device name is still /dev/pmem. It's already the case
that you need to check behind just the name of the device to figure
out if something is actually volatile or not (see memmap=ss!nn
configurations),

I don't have any experience with using memmap but if it's primarily used
by developers without NVDIMMs, they'd know it's not persistent. Or is it
primarily used by administrators using non-NFIT NVDIMMs, in which case it
is persistent?

In any case, how exactly does one determine whether the device is volatile
or not? I'm dumb so tell me the command line or API.

Especially with memmap= or e820-defined memory it's unknowable from
the kernel. We don't know if the user is using it to cover for a
platform where there is no BIOS support for advertising persistent
memory, or if they have a BIOS that does not produce an NFIT as is the
case here [1], or if it is some developer just testing with no
expectation of persistence.

[1]: https://github.com/pmem/ndctl/issues/21

Ok. I'm not really concerned about those cases but was asking since
you mentioned memmap as an example.

In any case, how does someone, like a system administrator, confirm that
a /dev/pmem device is a device that claims to be persistent? Is there
a specific ndctl command line that would make it obvious what the Linux
device is on a device that claims to be persistent?

so I would not be in favor of changing the device
name if we think the memory might not be persistent. Moreover, I think
it was a mistake that we change the device name for btt or not, and
I'm glad Matthew talked me out of making the same mistake with
memory-mode vs raw-mode pmem namespaces. So, the block device name
just reflects the driver of the block device, not the properties of
the device, just like all other block device instances.

I agree that creating a new device name for BTT was perhaps a mistake,
although it would be good to know how to query a device property for
sector atomicity. The difference between BTT vs. non-BTT seems less
critical to me than knowing in an obvious way whether the device is
actually persistent.

We don't have a good way to answer "actually persistent" in the
general case. I'm thinking of cases where the energy source on the
DIMM has died, or we trigger one of the conditions that leads to the
""unable to guarantee persistence of writes" message.

There are certainly error conditions that can happen and we've talked
about that bit in our health status discussions. I think the question
of whether the device is healthy enough to be persistent right now
is different from whether the device is never ever going to be persistent.

The /dev/pmem
device name just tells you that your block device is hosted by a
driver that knows how to handle persistent memory constraints, but any
other details about the nature of the address range need to come from
other sources of information, and potentially information sources that
the kernel does not know about.

I'm asking about the other source of information in this specific case
where we're exposing pmem devices that will never ever be persistent.
Before we add these devices, I think we should be able to tell the user
how they can know the properties of the underlying device.

-- ljk