[GIT PULL] libnvdimm for 4.3
From: Williams, Dan J
Date: Thu Sep 03 2015 - 20:22:10 EST
Hi Linus, please pull from:
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm tags/libnvdimm-for-4.3
...to receive the libnvdimm update and related changes for 4.3.
This update has successfully completed a 0day-kbuild run and has
appeared in a linux-next release. The changes outside of the typical
drivers/nvdimm/ and drivers/acpi/nfit.[ch] paths are related to the
removal of IORESOURCE_CACHEABLE, the introduction of memremap(), and the
introduction of ZONE_DEVICE + devm_memremap_pages().
This has a minor conflict with a fix that went into v4.2, commit
de4a196c02a2 "nfit, nd_blk: BLK status register is only 32 bits", but
otherwise merges cleanly with mainline.
--
The following changes since commit cbfe8fa6cd672011c755c3cd85c9ffd4e2d10a6f:
Linux 4.2-rc4 (2015-07-26 12:26:21 -0700)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm tags/libnvdimm-for-4.3
for you to fetch changes up to 004f1afbe199e6ab20805b95aefd83ccd24bc5c7:
libnvdimm, pmem: direct map legacy pmem by default (2015-08-28 23:40:05 -0400)
----------------------------------------------------------------
libnvdimm for 4.3:
1/ Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
mechanism for adding device-driver-discovered memory regions to the
kernel's direct map. This facility is used by the pmem driver to
enable pfn_to_page() operations on the page frames returned by DAX
('direct_access' in 'struct block_device_operations'). For now, the
'memmap' allocation for these "device" pages comes from "System
RAM". Support for allocating the memmap from device memory will
arrive in a later kernel.
2/ Introduce memremap() to replace usages of ioremap_cache() and
ioremap_wt(). memremap() drops the __iomem annotation for these
mappings to memory that do not have i/o side effects. The
replacement of ioremap_cache() with memremap() is limited to the
pmem driver to ease merging the api change in v4.3. Completion of
the conversion is targeted for v4.4.
3/ Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
driver, update the VFS DAX implementation and PMEM api to provide
persistence guarantees for kernel operations on a DAX mapping.
4/ Convert the ACPI NFIT 'BLK' driver to map the block apertures as
cacheable to improve performance.
5/ Miscellaneous updates and fixes to libnvdimm including support
for issuing "address range scrub" commands, clarifying the optimal
'sector size' of pmem devices, a clarification of the usage of the
ACPI '_STA' (status) property for DIMM devices, and other minor
fixes.
----------------------------------------------------------------
Christoph Hellwig (4):
devres: add devm_memremap
pmem: switch to devm_ allocations
mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h
add devm_memremap_pages
Dan Williams (15):
libnvdimm, btt: sparse fix
mm: enhance region_is_ram() to region_intersects()
arch, drivers: don't include <asm/io.h> directly, use <linux/io.h> instead
cleanup IORESOURCE_CACHEABLE vs ioremap()
arch: introduce memremap()
visorbus: switch from ioremap_cache to memremap
pmem: convert to generic memremap
libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option
Merge branch 'pmem-api' into libnvdimm-for-next
dax: drop size parameter to ->direct_access()
mm: ZONE_DEVICE for "device memory"
x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB
libnvdimm, pfn: 'struct page' provider infrastructure
libnvdimm, pmem: 'struct page' for pmem
libnvdimm, pmem: direct map legacy pmem by default
Linda Knippers (1):
nfit: Don't check _STA on NVDIMM devices
Randy Dunlap (1):
nvdimm: fix inline function return type warning
Ross Zwisler (7):
pmem, x86: move x86 PMEM API to new pmem.h header
pmem: remove layer when calling arch_has_wmb_pmem()
pmem, x86: clean up conditional pmem includes
pmem: add copy_from_iter_pmem() and clear_pmem()
dax: update I/O path to do proper PMEM flushing
pmem, dax: have direct_access use __pmem annotation
nd_blk: change aperture mapping from WC to WB
Vishal Verma (6):
libnvdimm: Update name of the ars_status_record mask field
libnvdimm: Add DSM support for Address Range Scrub commands
libnvdimm, pmem: Change pmem physical sector size to PAGE_SIZE
libnvdimm, btt: clean up internal interfaces
libnvdimm, btt: consolidate arena validation
libnvdimm, btt: write and validate parent_uuid
yalin wang (1):
nvdimm: change to use generic kvfree()
Documentation/filesystems/Locking | 3 +-
MAINTAINERS | 1 +
arch/arm/include/asm/memory.h | 6 -
arch/arm/mach-clps711x/board-cdb89712.c | 2 +-
arch/arm/mach-shmobile/pm-rcar.c | 2 +-
arch/arm64/include/asm/memory.h | 6 -
arch/ia64/include/asm/io.h | 1 +
arch/ia64/kernel/cyclone.c | 2 +-
arch/ia64/mm/init.c | 4 +-
arch/powerpc/kernel/pci_of_scan.c | 2 +-
arch/powerpc/mm/mem.c | 4 +-
arch/powerpc/sysdev/axonram.c | 7 +-
arch/s390/mm/init.c | 2 +-
arch/sh/include/asm/io.h | 1 +
arch/sh/mm/init.c | 5 +-
arch/sparc/kernel/pci.c | 3 +-
arch/tile/mm/init.c | 2 +-
arch/unicore32/include/asm/memory.h | 6 -
arch/x86/Kconfig | 9 +-
arch/x86/include/asm/cacheflush.h | 73 +-----
arch/x86/include/asm/io.h | 6 -
arch/x86/include/asm/pmem.h | 153 +++++++++++
arch/x86/include/uapi/asm/e820.h | 2 +-
arch/x86/kernel/Makefile | 2 +-
arch/x86/kernel/pmem.c | 79 +-----
arch/x86/mm/init_32.c | 4 +-
arch/x86/mm/init_64.c | 4 +-
arch/xtensa/include/asm/io.h | 1 +
drivers/acpi/Kconfig | 1 +
drivers/acpi/nfit.c | 79 +++---
drivers/acpi/nfit.h | 17 +-
drivers/block/brd.c | 8 +-
drivers/isdn/icn/icn.h | 2 +-
drivers/mtd/devices/slram.c | 2 +-
drivers/mtd/nand/diskonchip.c | 2 +-
drivers/mtd/onenand/generic.c | 2 +-
drivers/nvdimm/Kconfig | 23 ++
drivers/nvdimm/Makefile | 5 +
drivers/nvdimm/btt.c | 50 +---
drivers/nvdimm/btt.h | 3 +
drivers/nvdimm/btt_devs.c | 215 ++++------------
drivers/nvdimm/claim.c | 201 +++++++++++++++
drivers/nvdimm/dimm_devs.c | 5 +-
drivers/nvdimm/e820.c | 87 +++++++
drivers/nvdimm/namespace_devs.c | 89 ++++++-
drivers/nvdimm/nd-core.h | 9 +
drivers/nvdimm/nd.h | 67 ++++-
drivers/nvdimm/pfn.h | 35 +++
drivers/nvdimm/pfn_devs.c | 337 +++++++++++++++++++++++++
drivers/nvdimm/pmem.c | 245 +++++++++++++++---
drivers/nvdimm/region.c | 2 +
drivers/nvdimm/region_devs.c | 20 ++
drivers/pci/probe.c | 3 +-
drivers/pnp/manager.c | 2 -
drivers/s390/block/dcssblk.c | 10 +-
drivers/scsi/aic94xx/aic94xx_init.c | 7 +-
drivers/scsi/arcmsr/arcmsr_hba.c | 5 +-
drivers/scsi/mvsas/mv_init.c | 15 +-
drivers/scsi/sun3x_esp.c | 2 +-
drivers/staging/comedi/drivers/ii_pci20kc.c | 1 +
drivers/staging/unisys/visorbus/visorchannel.c | 16 +-
drivers/staging/unisys/visorbus/visorchipset.c | 17 +-
drivers/tty/serial/8250/8250_core.c | 2 +-
drivers/video/fbdev/ocfb.c | 1 -
drivers/video/fbdev/s1d13xxxfb.c | 3 +-
drivers/video/fbdev/stifb.c | 1 +
fs/block_dev.c | 4 +-
fs/dax.c | 62 +++--
include/asm-generic/memory_model.h | 6 +
include/linux/blkdev.h | 8 +-
include/linux/io-mapping.h | 2 +-
include/linux/io.h | 33 +++
include/linux/libnvdimm.h | 4 +
include/linux/memory_hotplug.h | 5 +-
include/linux/mm.h | 9 +-
include/linux/mmzone.h | 23 ++
include/linux/mtd/map.h | 2 +-
include/linux/pmem.h | 115 ++++++---
include/uapi/linux/ndctl.h | 12 +-
include/video/vga.h | 2 +-
kernel/Makefile | 2 +
kernel/memremap.c | 190 ++++++++++++++
kernel/resource.c | 61 +++--
lib/Kconfig | 3 +
lib/devres.c | 13 +-
lib/pci_iomap.c | 7 +-
mm/Kconfig | 17 ++
mm/memory_hotplug.c | 14 +-
mm/page_alloc.c | 3 +
tools/testing/nvdimm/Kbuild | 13 +-
tools/testing/nvdimm/test/iomap.c | 85 ++++++-
tools/testing/nvdimm/test/nfit.c | 209 ++++++++++-----
92 files changed, 2142 insertions(+), 745 deletions(-)
create mode 100644 arch/x86/include/asm/pmem.h
create mode 100644 drivers/nvdimm/claim.c
create mode 100644 drivers/nvdimm/e820.c
create mode 100644 drivers/nvdimm/pfn.h
create mode 100644 drivers/nvdimm/pfn_devs.c
create mode 100644 kernel/memremap.c
commit 5e32940621eb62064d98f42c9889db71b0368bde
Author: Dan Williams <dan.j.williams@xxxxxxxxx>
Date: Sat Jul 11 10:02:46 2015 -0400
libnvdimm, btt: sparse fix
Fix:
drivers/nvdimm/btt.c:635:29: warning: restricted __le64 degrades to integer
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit ec92777f2ba93c00387b8fe53780c25adc57c744
Author: Vishal Verma <vishal.l.verma@xxxxxxxxx>
Date: Thu Jul 9 13:25:35 2015 -0600
libnvdimm: Update name of the ars_status_record mask field
The spec suggests that this is a simple 'length' field, not a mask.
Update the name accordingly.
Signed-off-by: Vishal Verma <vishal.l.verma@xxxxxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit 39c686b862cdb2049b90e095b6c6c727b2a7ab60
Author: Vishal Verma <vishal.l.verma@xxxxxxxxx>
Date: Thu Jul 9 13:25:36 2015 -0600
libnvdimm: Add DSM support for Address Range Scrub commands
Add support for the three ARS DSM commands:
- Query ARS Capabilities - Queries the firmware to check if a given
range supports scrub, and if so, which type (persistent vs. volatile)
- Start ARS - Starts a scrub for a given range/type
- Query ARS Status - Checks status of a previously started scrub, and
provides the error logs if any.
The commands are described by the example DSM spec at:
http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
Also add these commands to the nfit_test test framework, and return
canned data.
Signed-off-by: Vishal Verma <vishal.l.verma@xxxxxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit 6b47496a6fc81816e7edaf8224dfb88e402a05f5
Author: Vishal Verma <vishal.l.verma@xxxxxxxxx>
Date: Thu Jul 23 11:58:48 2015 -0600
libnvdimm, pmem: Change pmem physical sector size to PAGE_SIZE
Based on a patch: c8fa317 brd: Request from fdisk 4k alignment by Boaz
Harrosh, allow fdisk to create properly aligned partitions for DAX. This
will also cause mkfs.ext4 to emit a warning if using a file system block
size of less than PAGE_SIZE.
Cc: Dan Williams <dan.j.williams@xxxxxxxxx>
Cc: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
Cc: Matthew Wilcox <matthew.r.wilcox@xxxxxxxxx>
Cc: Christoph Hellwig <hch@xxxxxx>
Cc: Elliott, Robert <Elliott@xxxxxx>
Signed-off-by: Vishal Verma <vishal.l.verma@xxxxxxxxx>
Acked-by: Boaz Harrosh <boaz@xxxxxxxxxxxxx>
Acked-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit 60e95f43fc8573e81f54b0c1e0bc542c2260d956
Author: Linda Knippers <linda.knippers@xxxxxx>
Date: Wed Jul 22 16:17:22 2015 -0400
nfit: Don't check _STA on NVDIMM devices
The _STA only applies to the root device, not the individual NVDIMMS,
so don't check here. NVDIMM device state flags are checked elsewhere.
Signed-off-by: Linda Knippers <linda.knippers@xxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit f6ef5a2a50816b58e3126206de13d0b9fdf89df5
Author: Randy Dunlap <rdunlap@xxxxxxxxxxxxx>
Date: Tue Jul 28 12:27:01 2015 -0700
nvdimm: fix inline function return type warning
Fix multiple build warnings when CONFIG_BTT is not enabled:
In file included from ../drivers/nvdimm/bus.c:29:0:
../drivers/nvdimm/nd.h:169:15: warning: return type defaults to 'int' [-Wreturn-type]
static inline nd_btt_probe(struct nd_namespace_common *ndns, void *drvdata)
^
Signed-off-by: Randy Dunlap <rdunlap@xxxxxxxxxxxxx>
Cc: Dan Williams <dan.j.williams@xxxxxxxxx>
Cc: linux-nvdimm@xxxxxxxxxxxx
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit 124fe20d94630b6f173dae5eb815e6e6e350c72d
Author: Dan Williams <dan.j.williams@xxxxxxxxx>
Date: Mon Aug 10 23:07:05 2015 -0400
mm: enhance region_is_ram() to region_intersects()
region_is_ram() is used to prevent the establishment of aliased mappings
to physical "System RAM" with incompatible cache settings. However, it
uses "-1" to indicate both "unknown" memory ranges (ranges not described
by platform firmware) and "mixed" ranges (where the parameters describe
a range that partially overlaps "System RAM").
Fix this up by explicitly tracking the "unknown" vs "mixed" resource
cases and returning REGION_INTERSECTS, REGION_MIXED, or REGION_DISJOINT.
This re-write also adds support for detecting when the requested region
completely eclipses all of a resource. Note, the implementation treats
overlaps between "unknown" and the requested memory type as
REGION_INTERSECTS.
Finally, other memory types can be passed in by name, for now the only
usage "System RAM".
Suggested-by: Luis R. Rodriguez <mcgrof@xxxxxxxx>
Reviewed-by: Toshi Kani <toshi.kani@xxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit 2584cf83578c26db144730ef498f4070f82ee3ea
Author: Dan Williams <dan.j.williams@xxxxxxxxx>
Date: Mon Aug 10 23:07:05 2015 -0400
arch, drivers: don't include <asm/io.h> directly, use <linux/io.h> instead
Preparation for uniform definition of ioremap, ioremap_wc, ioremap_wt,
and ioremap_cache, tree-wide.
Acked-by: Christoph Hellwig <hch@xxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit 92b19ff50e8f242392d78b2aacc5b5b672f1796b
Author: Dan Williams <dan.j.williams@xxxxxxxxx>
Date: Mon Aug 10 23:07:06 2015 -0400
cleanup IORESOURCE_CACHEABLE vs ioremap()
Quoting Arnd:
I was thinking the opposite approach and basically removing all uses
of IORESOURCE_CACHEABLE from the kernel. There are only a handful of
them.and we can probably replace them all with hardcoded
ioremap_cached() calls in the cases they are actually useful.
All existing usages of IORESOURCE_CACHEABLE call ioremap() instead of
ioremap_nocache() if the resource is cacheable, however ioremap() is
uncached by default. Clearly none of the existing usages care about the
cacheability. Particularly devm_ioremap_resource() never worked as
advertised since it always fell back to plain ioremap().
Clean this up as the new direction we want is to convert
ioremap_<type>() usages to memremap(..., flags).
Suggested-by: Arnd Bergmann <arnd@xxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit 92281dee825f6d2eb07c441437e4196a44b0861c
Author: Dan Williams <dan.j.williams@xxxxxxxxx>
Date: Mon Aug 10 23:07:06 2015 -0400
arch: introduce memremap()
Existing users of ioremap_cache() are mapping memory that is known in
advance to not have i/o side effects. These users are forced to cast
away the __iomem annotation, or otherwise neglect to fix the sparse
errors thrown when dereferencing pointers to this memory. Provide
memremap() as a non __iomem annotated ioremap_*() in the case when
ioremap is otherwise a pointer to cacheable memory. Empirically,
ioremap_<cacheable-type>() call sites are seeking memory-like semantics
(e.g. speculative reads, and prefetching permitted).
memremap() is a break from the ioremap implementation pattern of adding
a new memremap_<type>() for each mapping type and having silent
compatibility fall backs. Instead, the implementation defines flags
that are passed to the central memremap() and if a mapping type is not
supported by an arch memremap returns NULL.
We introduce a memremap prototype as a trivial wrapper of
ioremap_cache() and ioremap_wt(). Later, once all ioremap_cache() and
ioremap_wt() usage has been removed from drivers we teach archs to
implement arch_memremap() with the ability to strictly enforce the
mapping type.
Cc: Arnd Bergmann <arnd@xxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit 3103dc0304fd9c8ab576977cd98140d4fbac1730
Author: Dan Williams <dan.j.williams@xxxxxxxxx>
Date: Mon Aug 10 23:07:06 2015 -0400
visorbus: switch from ioremap_cache to memremap
In preparation for deprecating ioremap_cache() convert its usage in
visorbus to memremap.
Cc: Benjamin Romer <benjamin.romer@xxxxxxxxxx>
Cc: David Kershner <david.kershner@xxxxxxxxxx>
Acked-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit e836a256e8fd579c9d7a3685f22981225a1ca451
Author: Dan Williams <dan.j.williams@xxxxxxxxx>
Date: Wed Aug 12 18:42:56 2015 -0400
pmem: convert to generic memremap
Kill arch_memremap_pmem() and just let the architecture specify the
flags to be passed to memremap(). Default to writethrough by default.
Suggested-by: Christoph Hellwig <hch@xxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
Reviewed-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit fbde1414acc0440024083bf0c391b259bcfc4826
Author: Vishal Verma <vishal.l.verma@xxxxxxxxx>
Date: Wed Jul 29 14:58:07 2015 -0600
libnvdimm, btt: clean up internal interfaces
Consolidate the parameters passed to arena_is_valid into just nd_btt,
and an info block to increase re-usability.
Similarly, btt_arena_write_layout doesn't need to be passed a uuid, as
it can be obtained from arena->nd_btt.
Signed-off-by: Vishal Verma <vishal.l.verma@xxxxxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit ab45e7632717b811e0786e46ca5ad279cb731b66
Author: Vishal Verma <vishal.l.verma@xxxxxxxxx>
Date: Wed Jul 29 14:58:08 2015 -0600
libnvdimm, btt: consolidate arena validation
Use arena_is_valid as a common routine for checking the validity of an
info block from both discover_arenas, and nd_btt_probe.
As a result, don't check for validity of the BTT's UUID, and lbasize.
The checksum in the BTT info block guarantees self-consistency, and when
we're called from nd_btt_probe, we don't have a valid uuid or lbasize
available to check against.
Also cleanup to return a bool instead of an int.
Signed-off-by: Vishal Verma <vishal.l.verma@xxxxxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit 6ec689542b5bc516187917d49b112847dfb75b0b
Author: Vishal Verma <vishal.l.verma@xxxxxxxxx>
Date: Wed Jul 29 14:58:09 2015 -0600
libnvdimm, btt: write and validate parent_uuid
When a BTT is instantiated on a namespace it must validate the namespace
uuid matches the 'parent_uuid' stored in the btt superblock. This
property enforces that changing the namespace UUID invalidates all
former BTT instances on that storage. For "IO namespaces" that don't
have a label or UUID, the parent_uuid is set to zero, and this
validation is skipped. For such cases, old BTTs have to be invalidated
by forcing the namespace to raw mode, and overwriting the BTT info
blocks.
Based on a patch by Dan Williams <dan.j.williams@xxxxxxxxx>
Signed-off-by: Vishal Verma <vishal.l.verma@xxxxxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit 7d3dcf26a6559fa82af3f53e2c8b163cec95fdaf
Author: Christoph Hellwig <hch@xxxxxx>
Date: Mon Aug 10 23:07:07 2015 -0400
devres: add devm_memremap
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit 708ab62bef1ed3a3cf065a4138bd87f5d083cfeb
Author: Christoph Hellwig <hch@xxxxxx>
Date: Mon Aug 10 23:07:08 2015 -0400
pmem: switch to devm_ allocations
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
[djbw: tools/testing/nvdimm/ and memunmap_pmem support]
Reviewed-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit 7a67832c7e44c20935c5d6f2264035a0f7bf0d8f
Author: Dan Williams <dan.j.williams@xxxxxxxxx>
Date: Wed Aug 19 00:34:34 2015 -0400
libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option
We currently register a platform device for e820 type-12 memory and
register a nvdimm bus beneath it. Registering the platform device
triggers the device-core machinery to probe for a driver, but that
search currently comes up empty. Building the nvdimm-bus registration
into the e820_pmem platform device registration in this way forces
libnvdimm to be built-in. Instead, convert the built-in portion of
CONFIG_X86_PMEM_LEGACY to simply register a platform device and move the
rest of the logic to the driver for e820_pmem, for the following
reasons:
1/ Letting e820_pmem support be a module allows building and testing
libnvdimm.ko changes without rebooting
2/ All the normal policy around modules can be applied to e820_pmem
(unbind to disable and/or blacklisting the module from loading by
default)
3/ Moving the driver to a generic location and converting it to scan
"iomem_resource" rather than "e820.map" means any other architecture can
take advantage of this simple nvdimm resource discovery mechanism by
registering a resource named "Persistent Memory (legacy)"
Cc: Christoph Hellwig <hch@xxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit 40603526569b304dd92f720f2f8ab11e828ea145
Author: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
Date: Tue Aug 18 13:55:36 2015 -0600
pmem, x86: move x86 PMEM API to new pmem.h header
Move the x86 PMEM API implementation out of asm/cacheflush.h and into
its own header asm/pmem.h. This will allow members of the PMEM API to
be more easily identified on this and other architectures.
Signed-off-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
Suggested-by: Christoph Hellwig <hch@xxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit 18279b467a9d89afe44afbc19d768e834dbf4545
Author: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
Date: Tue Aug 18 13:55:37 2015 -0600
pmem: remove layer when calling arch_has_wmb_pmem()
Prior to this change arch_has_wmb_pmem() was only called by
arch_has_pmem_api(). Both arch_has_wmb_pmem() and arch_has_pmem_api()
checked to make sure that CONFIG_ARCH_HAS_PMEM_API was enabled.
Instead, remove the old arch_has_wmb_pmem() wrapper to be rid of one
extra layer of indirection and the redundant CONFIG_ARCH_HAS_PMEM_API
check. Rename __arch_has_wmb_pmem() to arch_has_wmb_pmem() since we no
longer have a wrapper, and just have arch_has_pmem_api() call the
architecture specific arch_has_wmb_pmem() directly.
Signed-off-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit 4a370df5534ef727cba9a9d74bf22e0609f91d6e
Author: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
Date: Tue Aug 18 13:55:38 2015 -0600
pmem, x86: clean up conditional pmem includes
Prior to this change x86_64 used the pmem defines in
arch/x86/include/asm/pmem.h, and UM used the default ones at the
top of include/linux/pmem.h. The inclusion or exclusion in linux/pmem.h
was controlled by CONFIG_ARCH_HAS_PMEM_API, but the ones in asm/pmem.h
were controlled by ARCH_HAS_NOCACHE_UACCESS.
Instead, control them both with CONFIG_ARCH_HAS_PMEM_API so that it's
clear that they are related and we don't run into the possibility where
they are both included or excluded. Also remove a bunch of stale
function prototypes meant for UM in asm/pmem.h - these just conflicted
with the inline defaults in linux/pmem.h and gave compile errors.
Signed-off-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit 5de490daec8b6354b90d5c9d3e2415b195f5adb6
Author: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
Date: Tue Aug 18 13:55:39 2015 -0600
pmem: add copy_from_iter_pmem() and clear_pmem()
Add support for two new PMEM APIs, copy_from_iter_pmem() and
clear_pmem(). copy_from_iter_pmem() is used to copy data from an
iterator into a PMEM buffer. clear_pmem() zeros a PMEM memory range.
Both of these new APIs must be explicitly ordered using a wmb_pmem()
function call and are implemented in such a way that the wmb_pmem()
will make the stores to PMEM durable. Because both APIs are unordered
they can be called as needed without introducing any unwanted memory
barriers.
Signed-off-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit 2765cfbb342c727c3fd47b165196cb16da158022
Author: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
Date: Tue Aug 18 13:55:40 2015 -0600
dax: update I/O path to do proper PMEM flushing
Update the DAX I/O path so that all operations that store data (I/O
writes, zeroing blocks, punching holes, etc.) properly synchronize the
stores to media using the PMEM API. This ensures that the data DAX is
writing is durable on media before the operation completes.
Signed-off-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit e2e05394e4a3420dab96f728df4531893494e15d
Author: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
Date: Tue Aug 18 13:55:41 2015 -0600
pmem, dax: have direct_access use __pmem annotation
Update the annotation for the kaddr pointer returned by direct_access()
so that it is a __pmem pointer. This is consistent with the PMEM driver
and with how this direct_access() pointer is used in the DAX code.
Signed-off-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit a06a7576526e10a99ea7721533e7f2df3e26baad
Author: yalin wang <yalin.wang2010@xxxxxxxxx>
Date: Thu Aug 27 19:35:48 2015 -0400
nvdimm: change to use generic kvfree()
Signed-off-by: yalin wang <yalin.wang2010@xxxxxxxxx>
Reviewed-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit 67a3e8fe90156d41cd480d3dfbb40f3bc007c262
Author: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
Date: Thu Aug 27 13:14:20 2015 -0600
nd_blk: change aperture mapping from WC to WB
This should result in a pretty sizeable performance gain for reads. For
rough comparison I did some simple read testing using PMEM to compare
reads of write combining (WC) mappings vs write-back (WB). This was
done on a random lab machine.
PMEM reads from a write combining mapping:
# dd of=/dev/null if=/dev/pmem0 bs=4096 count=100000
100000+0 records in
100000+0 records out
409600000 bytes (410 MB) copied, 9.2855 s, 44.1 MB/s
PMEM reads from a write-back mapping:
# dd of=/dev/null if=/dev/pmem0 bs=4096 count=1000000
1000000+0 records in
1000000+0 records out
4096000000 bytes (4.1 GB) copied, 3.44034 s, 1.2 GB/s
To be able to safely support a write-back aperture I needed to add
support for the "read flush" _DSM flag, as outlined in the DSM spec:
http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
This flag tells the ND BLK driver that it needs to flush the cache lines
associated with the aperture after the aperture is moved but before any
new data is read. This ensures that any stale cache lines from the
previous contents of the aperture will be discarded from the processor
cache, and the new data will be read properly from the DIMM. We know
that the cache lines are clean and will be discarded without any
writeback because either a) the previous aperture operation was a read,
and we never modified the contents of the aperture, or b) the previous
aperture operation was a write and we must have written back the dirtied
contents of the aperture to the DIMM before the I/O was completed.
In order to add support for the "read flush" flag I needed to add a
generic routine to invalidate cache lines, mmio_flush_range(). This is
protected by the ARCH_HAS_MMIO_FLUSH Kconfig variable, and is currently
only supported on x86.
Signed-off-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit 4a9bf88a5caa8495b5eb2b738d5fb40924bbc538
Merge: a06a7576526e 67a3e8fe9015
Author: Dan Williams <dan.j.williams@xxxxxxxxx>
Date: Thu Aug 27 19:40:26 2015 -0400
Merge branch 'pmem-api' into libnvdimm-for-next
commit cb389b9c0e00c30c9daf20287f7d91e2466edbb1
Author: Dan Williams <dan.j.williams@xxxxxxxxx>
Date: Fri Aug 7 17:41:00 2015 -0400
dax: drop size parameter to ->direct_access()
None of the implementations currently use it. The common
bdev_direct_access() entry point handles all the size checks before
calling ->direct_access().
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit 012dcef3f058385268630c0003e9b7f8dcafbeb4
Author: Christoph Hellwig <hch@xxxxxx>
Date: Fri Aug 7 17:41:01 2015 -0400
mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h
Three architectures already define these, and we'll need them genericly
soon.
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit 033fbae988fcb67e5077203512181890848b8e90
Author: Dan Williams <dan.j.williams@xxxxxxxxx>
Date: Sun Aug 9 15:29:06 2015 -0400
mm: ZONE_DEVICE for "device memory"
While pmem is usable as a block device or via DAX mappings to userspace
there are several usage scenarios that can not target pmem due to its
lack of struct page coverage. In preparation for "hot plugging" pmem
into the vmemmap add ZONE_DEVICE as a new zone to tag these pages
separately from the ones that are subject to standard page allocations.
Importantly "device memory" can be removed at will by userspace
unbinding the driver of the device.
Having a separate zone prevents allocation and otherwise marks these
pages that are distinct from typical uniform memory. Device memory has
different lifetime and performance characteristics than RAM. However,
since we have run out of ZONES_SHIFT bits this functionality currently
depends on sacrificing ZONE_DMA.
Cc: H. Peter Anvin <hpa@xxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
Cc: Rik van Riel <riel@xxxxxxxxxx>
Cc: Mel Gorman <mgorman@xxxxxxx>
Cc: Jerome Glisse <j.glisse@xxxxxxxxx>
[hch: various simplifications in the arch interface]
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit 41e94a851304f7acac840adec4004f8aeee53ad4
Author: Christoph Hellwig <hch@xxxxxx>
Date: Mon Aug 17 16:00:35 2015 +0200
add devm_memremap_pages
This behaves like devm_memremap except that it ensures we have page
structures available that can back the region.
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
[djbw: catch attempts to remap RAM, drop flags]
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit 96601adb745186ccbcf5b078d4756f13381ec2af
Author: Dan Williams <dan.j.williams@xxxxxxxxx>
Date: Mon Aug 24 18:29:38 2015 -0400
x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB
Given that a write-back (WB) mapping plus non-temporal stores is
expected to be the most efficient way to access PMEM, update the
definition of ARCH_HAS_PMEM_API to imply arch support for
WB-mapped-PMEM. This is needed as a pre-requisite for adding PMEM to
the direct map and mapping it with struct page.
The above clarification for X86_64 means that memcpy_to_pmem() is
permitted to use the non-temporal arch_memcpy_to_pmem() rather than
needlessly fall back to default_memcpy_to_pmem() when the pcommit
instruction is not available. When arch_memcpy_to_pmem() is not
guaranteed to flush writes out of cache, i.e. on older X86_32
implementations where non-temporal stores may just dirty cache,
ARCH_HAS_PMEM_API is simply disabled.
The default fall back for persistent memory handling remains. Namely,
map it with the WT (write-through) cache-type and hope for the best.
arch_has_pmem_api() is updated to only indicate whether the arch
provides the proper helpers to meet the minimum "writes are visible
outside the cache hierarchy after memcpy_to_pmem() + wmb_pmem()". Code
that cares whether wmb_pmem() actually flushes writes to pmem must now
call arch_has_wmb_pmem() directly.
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
Reviewed-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
[hch: set ARCH_HAS_PMEM_API=n on x86_32]
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
[toshi: x86_32 compile fixes]
Signed-off-by: Toshi Kani <toshi.kani@xxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit e1455744b27c9e6115c3508a7b2902157c2c4347
Author: Dan Williams <dan.j.williams@xxxxxxxxx>
Date: Thu Jul 30 17:57:47 2015 -0400
libnvdimm, pfn: 'struct page' provider infrastructure
Implement the base infrastructure for libnvdimm PFN devices. Similar to
BTT devices they take a namespace as a backing device and layer
functionality on top. In this case the functionality is reserving space
for an array of 'struct page' entries to be handed out through
pfn_to_page(). For now this is just the basic libnvdimm-device-model for
configuring the base PFN device.
As the namespace claiming mechanism for PFN devices is mostly identical
to BTT devices drivers/nvdimm/claim.c is created to house the common
bits.
Cc: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit 32ab0a3f51701cb37ab960635254d5f84ec3de0a
Author: Dan Williams <dan.j.williams@xxxxxxxxx>
Date: Sat Aug 1 02:16:37 2015 -0400
libnvdimm, pmem: 'struct page' for pmem
Enable the pmem driver to handle PFN device instances. Attaching a pmem
namespace to a pfn device triggers the driver to allocate and initialize
struct page entries for pmem. Memory capacity for this allocation comes
exclusively from RAM for now which is suitable for low PMEM to RAM
ratios. This mechanism will be expanded later for setting an "allocate
from PMEM" policy.
Cc: Boaz Harrosh <boaz@xxxxxxxxxxxxx>
Cc: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
Cc: Christoph Hellwig <hch@xxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
commit 004f1afbe199e6ab20805b95aefd83ccd24bc5c7
Author: Dan Williams <dan.j.williams@xxxxxxxxx>
Date: Mon Aug 24 19:20:23 2015 -0400
libnvdimm, pmem: direct map legacy pmem by default
The expectation is that the legacy / non-standard pmem discovery method
(e820 type-12) will only ever be used to describe small quantities of
persistent memory. Larger capacities will be described via the ACPI
NFIT. When "allocate struct page from pmem" support is added this default
policy can be overridden by assigning a legacy pmem namespace to a pfn
device, however this would be only be necessary if a platform used the
legacy mechanism to define a very large range.
Cc: Christoph Hellwig <hch@xxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>