Re: [PATCH] x86 get_unmapped_area: Add PMD alignment for DAX PMD mmap
From: Matthew Wilcox
Date: Wed Apr 06 2016 - 12:50:43 EST
On Wed, Apr 06, 2016 at 07:58:09AM -0600, Toshi Kani wrote:
> When CONFIG_FS_DAX_PMD is set, DAX supports mmap() using PMD page
> size. This feature relies on both mmap virtual address and FS
> block data (i.e. physical address) to be aligned by the PMD page
> size. Users can use mkfs options to specify FS to align block
> allocations. However, aligning mmap() address requires application
> changes to mmap() calls, such as:
>
> - /* let the kernel to assign a mmap addr */
> - mptr = mmap(NULL, fsize, PROT_READ|PROT_WRITE, FLAGS, fd, 0);
>
> + /* 1. obtain a PMD-aligned virtual address */
> + ret = posix_memalign(&mptr, PMD_SIZE, fsize);
> + if (!ret)
> + free(mptr); /* 2. release the virt addr */
> +
> + /* 3. then pass the PMD-aligned virt addr to mmap() */
> + mptr = mmap(mptr, fsize, PROT_READ|PROT_WRITE, FLAGS, fd, 0);
>
> These changes add unnecessary dependency to DAX and PMD page size
> into application code. The kernel should assign a mmap address
> appropriate for the operation.
I question the need for this patch. Choosing an appropriate base address
is the least of the changes needed for an application to take advantage of
DAX. The NVML chooses appropriate addresses and gets a properly aligned
address without any kernel code.
> Change arch_get_unmapped_area() and arch_get_unmapped_area_topdown()
> to request PMD_SIZE alignment when the request is for a DAX file and
> its mapping range is large enough for using a PMD page.
I think this is the wrong place for it, if we decide that this is the
right thing to do. The filesystem has a get_unmapped_area() which
should be used instead.
> @@ -157,6 +157,13 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
> info.align_mask = get_align_mask();
> info.align_offset += get_align_bits();
> }
> + if (filp && IS_ENABLED(CONFIG_FS_DAX_PMD) && IS_DAX(file_inode(filp))) {
And there's never a need for the IS_ENABLED. IS_DAX() compiles to '0' if
CONFIG_FS_DAX is disabled.
And where would this end? Would you also change this code to look for
1GB entries if CONFIG_FS_DAX_PUD is enabled? Far better to have this
in the individual filesystem (probably calling a common helper in the DAX code).