RE: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device support

From: Elliott, Robert (Server Storage)
Date: Tue Apr 28 2015 - 17:26:15 EST

> -----Original Message-----
> From: Linux-nvdimm [mailto:linux-nvdimm-bounces@xxxxxxxxxxxx] On Behalf Of
> Dan Williams
> Sent: Tuesday, April 28, 2015 1:24 PM
> To: linux-nvdimm@xxxxxxxxxxxx
> Cc: Neil Brown; Dave Chinner; H. Peter Anvin; Christoph Hellwig; Rafael J.
> Wysocki; Robert Moore; Ingo Molnar; linux-acpi@xxxxxxxxxxxxxxx; Jens Axboe;
> Borislav Petkov; Thomas Gleixner; Greg KH; linux-kernel@xxxxxxxxxxxxxxx;
> Andy Lutomirski; Andrew Morton; Linus Torvalds
> Subject: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device
> support
> Changes since v1 [1]: Incorporates feedback received prior to April 24.

Here are some comments on the sysfs properties reported for a pmem device.
They are based on v1, but I don't think v2 changes anything.

1. This confuses lsblk (part of util-linux):

lsblk shows:
pmem0 251:0 0 8G 0 worm
pmem1 251:16 0 8G 0 worm
pmem2 251:32 0 8G 0 worm
pmem3 251:48 0 8G 0 worm
pmem4 251:64 0 8G 0 worm
pmem5 251:80 0 8G 0 worm
pmem6 251:96 0 8G 0 worm
pmem7 251:112 0 8G 0 worm

lsblk's blkdev_scsi_type_to_name() considers 4 to mean
SCSI_TYPE_WORM (write once read many ... used for certain optical
and tape drives).

I'm not sure what nd and pmem are doing to result in that value.

2. To avoid confusing software trying to detect fast storage vs.
slow storage devices via sysfs, this value should be 0:

That can be done by adding this shortly after the blk_alloc_queue call:
queue_flag_set_unlocked(QUEUE_FLAG_NONROT, pmem->pmem_queue);

3. Is there any reason to have a 512 KiB limit on the transfer

That is from:
blk_queue_max_hw_sectors(pmem->pmem_queue, 1024);

4. These are read-writeable, but IOs never reach a queue, so
the queue size is irrelevant and merging never happens:

Consider making them both read-only with:
* nomerges set to 2 (no merging happening)
* nr_requests as small as the block layer allows to avoid
wasting memory.

5. No scatter-gather lists are created by the driver, so these
read-only fields are meaningless:

Is there a better way to report them as irrelevant?

6. There is no completion processing, so the read-writeable
cpu affinity is not used:

Consider making it read-only and set to 2, meaning the
completions always run on the requesting CPU.

7. With mmap() allowing less than logical block sized accesses
to the device, this could be considered misleading:

Perhaps that needs to be 1 byte or a cacheline size (64 bytes
on x86) to indicate that direct partial logical block accesses
are possible. The btt driver could report 512 as one indication
it is different.

I wouldn't be surprised if smaller values than the logical block
size confused some software, though.

Robert Elliott, HP Server Storage
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at