Re: [PATCH v11 00/25] mm/gup: track dma-pinned pages: FOLL_PIN

From: John Hubbard
Date: Sat Dec 21 2019 - 19:02:44 EST


On 12/21/19 2:08 AM, Leon Romanovsky wrote:
On Fri, Dec 20, 2019 at 03:54:55PM -0800, John Hubbard wrote:
On 12/20/19 10:29 AM, Leon Romanovsky wrote:
...
$ ./build.sh
$ build/bin/run_tests.py

If you get things that far I think Leon can get a reproduction for you

I'm not so optimistic about that.


OK, I'm going to proceed for now on the assumption that I've got an overflow
problem that happens when huge pages are pinned. If I can get more information,
great, otherwise it's probably enough.

One thing: for your repro, if you know the huge page size, and the system
page size for that case, that would really help. Also the number of pins per
page, more or less, that you'd expect. Because Jason says that only 2M huge
pages are used...

Because the other possibility is that the refcount really is going negative,
likely due to a mismatched pin/unpin somehow.

If there's not an obvious repro case available, but you do have one (is it easy
to repro, though?), then *if* you have the time, I could point you to a github
branch that reduces GUP_PIN_COUNTING_BIAS by, say, 4x, by applying this:

I'll see what I can do this Sunday.


The other data point that might shed light on whether it's a mismatch (this only
works if the system is not actually crashing, though), is checking the new
vmstat items, like this:

$ grep foll_pin /proc/vmstat
nr_foll_pin_requested 16288188
nr_foll_pin_returned 16288188

...but OTOH, if you've got long-term pins, then those are *supposed* to be
mismatched, so it only really helps in between tests.

thanks,
--
John Hubbard
NVIDIA