Re: [HMM v17 00/14] HMM (Heterogeneous Memory Management) v17

From: John Hubbard
Date: Wed Feb 22 2017 - 18:59:15 EST


On 02/22/2017 12:27 AM, Balbir Singh wrote:
On Wed, Feb 22, 2017 at 7:16 PM, Andrew Morton
<akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
On Wed, 22 Feb 2017 18:19:15 +1100 Balbir Singh <bsingharora@xxxxxxxxx> wrote:

Andrew, do we expect to get this in 4.11/4.12? Just curious.


I'll be taking a serious look after -rc1.

The lack of reviewed-by, acked-by and tested-by is a concern. It's
rather odd for a patchset in the 17th revision! What's up with that?

Have you reviewed or tested the patches?

I reviewed v14/15 of the patches. Aneesh reviewed some versions as
well. I know a few people who tested a small subset of the patches,
I'll get them to report back as well. I think John Hubbard has been
testing iterations as well. CC'ing other interested people as well

Balbir


Yes, Evgeny Baskakov and I have been testing each of the posted versions. We are using both migration and mirroring, and have a small set of multi-threaded and multi-device tests. I've been procastinating about writing up a summary of the test results, partly because the patchset is still changing (bug fixes, new features, API changes) and so we keep resetting our testing.

We (ahem, actually Evgeny has done most of the work) have been debugging and proposing fixes directly to Jerome, and that email traffic with Jerome has not been CC-ing this list, so things have looked a little quieter than they really were.

Anyway, a very rudimentary testing report:

1. What we are testing: Our latest testing (in the last few weeks) has been against Jerome's repo, here:
git://people.freedesktop.org/~glisse/linux (branch: hmm-next)

which has moved ahead from his hmm-v17 branch. hmm-next adds a few bug fixes, and a new feature (populating CPU pages on a GPU fault). Here are the differences in summary:

$ git diff --stat hmm-v17 hmm-next
drivers/char/Kconfig | 10 +
drivers/char/Makefile | 1 +
drivers/char/hmm_dmirror.c | 1168 +++++++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/migrate.h | 8 +-
include/uapi/linux/hmm_dmirror.h | 54 +++
mm/hmm.c | 6 +-
mm/migrate.c | 174 ++++++--
7 files changed, 1388 insertions(+), 33 deletions(-)


2. API: As for the driver-kernel API: this is looking OK, although of course the documentation can be improved. As Jerome already explained, there are missing pieces functionality[1] that will be added later, and this may change the API, but for now, OK. With this initial API, we can handle both "device" and CPU page faults, and migrate pages around.

3. More testing plans: TODO: there are a lot of programs that can be easily modified, to use malloc instead of a special device-centric allocator. On our list.

4. Stability: still a little shaky, as we have some pretty recent bug fixes to try out.

5. Performance: I'll send out another note for that at some point. There was a performance bug that Jerome just recently fixed, and I want to see how it looks with that fix applied. No real surprises though.

6. Code reviews: the large size of the patchset, plus the requirement for a complicated driver to exercise it, makes it less likely for other people to review this patch series. It's a bit chicken-and-eggy, too, because our UVM driver can't be checked in and shipped until the kernel API stabilizes. heh.

-----

[1] For example, due to lacking file-backed memory support, some userspace program variables that are file-backed (initialized globals, etc) have to be mapped (from the device) instead of migrated to the device, on a device fault.

thanks,
john h