Re: [PATCH v3 3/3] mm/hmm/test: add self tests for HMM

From: Ralph Campbell
Date: Tue Oct 29 2019 - 17:16:14 EST



On 10/29/19 10:58 AM, Jason Gunthorpe wrote:
On Wed, Oct 23, 2019 at 12:55:15PM -0700, Ralph Campbell wrote:
Add self tests for HMM.

Signed-off-by: Ralph Campbell <rcampbell@xxxxxxxxxx>

---
MAINTAINERS | 3 +
drivers/char/Kconfig | 11 +
drivers/char/Makefile | 1 +
drivers/char/hmm_dmirror.c | 1566 ++++++++++++++++++++++++
include/Kbuild | 1 +
include/uapi/linux/hmm_dmirror.h | 74 ++
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 3 +
tools/testing/selftests/vm/config | 3 +
tools/testing/selftests/vm/hmm-tests.c | 1311 ++++++++++++++++++++
tools/testing/selftests/vm/run_vmtests | 16 +
tools/testing/selftests/vm/test_hmm.sh | 97 ++
12 files changed, 3087 insertions(+)
create mode 100644 drivers/char/hmm_dmirror.c
create mode 100644 include/uapi/linux/hmm_dmirror.h
create mode 100644 tools/testing/selftests/vm/hmm-tests.c
create mode 100755 tools/testing/selftests/vm/test_hmm.sh

This is really big, it would be nice to get a comment from the various
kernel testing folks if this approach makes sense with the test
frameworks. Do we have other drivers that are only intended to be used
by selftests?

Frankly, I'm not super excited about the idea of a 'test driver', it
seems more logical for testing to have some way for a test harness to
call hmm_range_fault() under various conditions and check the results?

test_vmalloc.sh at least uses a test module(s).

It seems especially over-complicated to use a full page table layout
for this, wouldn't something simple like an xarray be good enough for
test purposes?

Possibly. A page table is really just a lookup table from virtual address
to pfn/page. Part of the rationale was to mimic what a real device might do.


+/*
+ * Below are the file operation for the dmirror device file. Only ioctl matters.
+ *
+ * Note this is highly specific to the dmirror device driver and should not be
+ * construed as an example on how to design the API a real device driver would
+ * expose to userspace.
+ */
+static ssize_t dmirror_fops_read(struct file *filp,
+ char __user *buf,
+ size_t count,
+ loff_t *ppos)
+{
+ return -EINVAL;
+}
+
+static ssize_t dmirror_fops_write(struct file *filp,
+ const char __user *buf,
+ size_t count,
+ loff_t *ppos)
+{
+ return -EINVAL;
+}
+
+static int dmirror_fops_mmap(struct file *filp, struct vm_area_struct *vma)
+{
+ /* Forbid mmap of the dmirror device file. */
+ return -EINVAL;
+}

I'm pretty sure these can just be left as NULL in the fops?

I think so.

+static int dmirror_fault(struct dmirror *dmirror,
+ unsigned long start,
+ unsigned long end,
+ bool write)
+{
+ struct mm_struct *mm = dmirror->mirror.hmm->mmu_notifier.mm;
+ unsigned long addr;
+ unsigned long next;
+ uint64_t pfns[64];
+ struct hmm_range range = {
+ .pfns = pfns,
+ .flags = dmirror_hmm_flags,
+ .values = dmirror_hmm_values,
+ .pfn_shift = DPT_SHIFT,
+ .pfn_flags_mask = ~(dmirror_hmm_flags[HMM_PFN_VALID] |
+ dmirror_hmm_flags[HMM_PFN_WRITE]),
+ .default_flags = dmirror_hmm_flags[HMM_PFN_VALID] |
+ (write ? dmirror_hmm_flags[HMM_PFN_WRITE] : 0),
+ };
+ int ret = 0;
+
+ for (addr = start; addr < end; ) {
+ long count;
+
+ next = min(addr + (ARRAY_SIZE(pfns) << PAGE_SHIFT), end);
+ range.start = addr;
+ range.end = next;
+
+ down_read(&mm->mmap_sem);
+
+ ret = hmm_range_register(&range, &dmirror->mirror);
+ if (ret) {
+ up_read(&mm->mmap_sem);
+ break;
+ }
+
+ if (!hmm_range_wait_until_valid(&range,
+ DMIRROR_RANGE_FAULT_TIMEOUT)) {
+ hmm_range_unregister(&range);
+ up_read(&mm->mmap_sem);
+ continue;
+ }
+
+ count = hmm_range_fault(&range, 0);
+ if (count < 0) {
+ ret = count;
+ hmm_range_unregister(&range);
+ up_read(&mm->mmap_sem);
+ break;
+ }
+
+ if (!hmm_range_valid(&range)) {

There is no 'driver lock' being held here, how does this work?
Shouldn't it hold dmirror->mutex for this sequence?

I have a modified version of this driver that's based on your series
removing hmm_mirror_register() which uses a mutex.
Otherwise, it looks similar to the changes in nouveau.

+ hmm_range_unregister(&range);
+ up_read(&mm->mmap_sem);
+ continue;
+ }
+ mutex_lock(&dmirror->mutex);
+ ret = dmirror_pt_walk(dmirror, dmirror_do_fault,
+ addr, next, &range, true);
+ mutex_unlock(&dmirror->mutex);

Ie move it down into this block

+ hmm_range_unregister(&range);
+ up_read(&mm->mmap_sem);
+ if (ret)
+ break;
+
+ addr = next;
+ }
+
+ return ret;
+}

+static int dmirror_read(struct dmirror *dmirror,
+ struct hmm_dmirror_cmd *cmd)
+{

Why not just use pread()/pwrite() for this instead of an ioctl?

pread()/pwrite() could certainly be implemented.
I think the idea was that the read/write is actually the "device"
doing read/write and making that clearly different from a program
reading/writing the device. Also, the ioctl() allows information
about what faults or events happened during the operation. I only
have number of pages and number of page faults returned at the moment,
but one of Jerome's version of this driver had other counters being
returned.

+ struct dmirror_bounce bounce;
+ unsigned long start, end;
+ unsigned long size = cmd->npages << PAGE_SHIFT;
+ int ret;
+
+ start = cmd->addr;
+ end = start + size;
+ if (end < start)
+ return -EINVAL;
+
+ ret = dmirror_bounce_init(&bounce, start, size);
+ if (ret)
+ return ret;
+
+static int dmirror_snapshot(struct dmirror *dmirror,
+ struct hmm_dmirror_cmd *cmd)
+{
+ struct mm_struct *mm = dmirror->mirror.hmm->mmu_notifier.mm;
+ unsigned long start, end;
+ unsigned long size = cmd->npages << PAGE_SHIFT;
+ unsigned long addr;
+ unsigned long next;
+ uint64_t pfns[64];
+ unsigned char perm[64];
+ char __user *uptr;
+ struct hmm_range range = {
+ .pfns = pfns,
+ .flags = dmirror_hmm_flags,
+ .values = dmirror_hmm_values,
+ .pfn_shift = DPT_SHIFT,
+ .pfn_flags_mask = ~0ULL,
+ };
+ int ret = 0;
+
+ start = cmd->addr;
+ end = start + size;
+ uptr = (void __user *)cmd->ptr;
+
+ for (addr = start; addr < end; ) {
+ long count;
+ unsigned long i;
+ unsigned long n;
+
+ next = min(addr + (ARRAY_SIZE(pfns) << PAGE_SHIFT), end);
+ range.start = addr;
+ range.end = next;
+
+ down_read(&mm->mmap_sem);
+
+ ret = hmm_range_register(&range, &dmirror->mirror);
+ if (ret) {
+ up_read(&mm->mmap_sem);
+ break;
+ }
+
+ if (!hmm_range_wait_until_valid(&range,
+ DMIRROR_RANGE_FAULT_TIMEOUT)) {
+ hmm_range_unregister(&range);
+ up_read(&mm->mmap_sem);
+ continue;
+ }
+
+ count = hmm_range_fault(&range, HMM_FAULT_SNAPSHOT);
+ if (count < 0) {
+ ret = count;
+ hmm_range_unregister(&range);
+ up_read(&mm->mmap_sem);
+ if (ret == -EBUSY)
+ continue;
+ break;
+ }
+
+ if (!hmm_range_valid(&range)) {

Same as for dmirror_fault

+ hmm_range_unregister(&range);
+ up_read(&mm->mmap_sem);
+ continue;
+ }
+
+ n = (next - addr) >> PAGE_SHIFT;
+ for (i = 0; i < n; i++)
+ dmirror_mkentry(dmirror, &range, perm + i, pfns[i]);

Is this missing locking too?

Yes. It's in the updated version as mentioned above.

+static int dmirror_remove(struct platform_device *pdev)
+{
+ /* all probe actions are unwound by devm */
+ return 0;
+}
+
+static struct platform_driver dmirror_device_driver = {
+ .probe = dmirror_probe,
+ .remove = dmirror_remove,
+ .driver = {
+ .name = "HMM_DMIRROR",
+ },
+};

This presence of a platform_driver and device is very confusing. I'm
sure Greg KH would object to this as a misuse of platform drivers.

A platform device isn't needed to create a char dev, so what is this for?

The devm_request_free_mem_region() and devm_memremap_pages() calls for
creating the ZONE_DEVICE private pages tie into the devm* clean up framework.
I thought a platform_driver was the simplest way to also be able to call
devm_add_action_or_reset() to clean up on module unload and be compatible
with the private page clean up.

diff --git a/include/Kbuild b/include/Kbuild
index ffba79483cc5..6ffb44a45957 100644
--- a/include/Kbuild
+++ b/include/Kbuild
@@ -1063,6 +1063,7 @@ header-test- += uapi/linux/coda_psdev.h
header-test- += uapi/linux/errqueue.h
header-test- += uapi/linux/eventpoll.h
header-test- += uapi/linux/hdlc/ioctl.h
+header-test- += uapi/linux/hmm_dmirror.h

Why? This list should only be updated if the header is broken in some
way.

Should this be in include/linux/ instead?
I wasn't sure where the "right" place was to put the header.


diff --git a/tools/testing/selftests/vm/hmm-tests.c b/tools/testing/selftests/vm/hmm-tests.c
new file mode 100644
index 000000000000..f4ae6188fd0e
--- /dev/null
+++ b/tools/testing/selftests/vm/hmm-tests.c
@@ -0,0 +1,1311 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2013 Red Hat Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.

btw, I think if a SPDX is present I don't think the license text is
required, just the copyright.

Since I was starting from Jerome's HMM test driver, I didn't want to
delete any of the original copyright text.
If Jerome is OK with just the SPDX header, that's OK with me.

I think these tests should also study the various case of invoke
pte_hole, ie faulting/snappshotting before/after a vma, or across a
vma range with a hole, etc, etc.

Jason


There are tests for vma hole, pte_none(), zero page, and normal page.
Nothing stress testing races, just set up the mmap() and test once.
I can add more test cases if you have something specific in mind.