On Tue, May 21, 2013 at 10:07:52AM +0800, Tang Chen wrote:--
....
I'm not saying using two callbacks before and after migration is better.
I don't want to use address_space_operations is because there is no such
member
for anonymous pages.
That depends on the nature of the pinning. For the general case of
get_user_pages(), you're correct that it won't work for anonymous memory.
In your idea, using a file mapping will create a
address_space_operations. But
I really don't think we can modify the way of memory allocation for all the
subsystems who has this problem. Maybe not just aio and cma. That means if
you want to pin pages in memory, you have to use a file mapping. This makes
the memory allocation more complicated. And the idea should be known by all
the subsystem developers. Is that going to happen ?
Different subsystems will need to use different approaches to fixing the
issue. I doubt any single approach will work for everything.
I also thought about reuse one field of struct page. But as you said, there
may not be many users of this functionality. Reusing a field of struct page
will make things more complicated and lead to high coupling.
What happens when more than one subsystem tries to pin a particular page?
What if it's a shared page rather than an anonymous page?
So, how about the other idea that Mel mentioned ?
We create a 1-1 mapping of pinned page ranges and the pinner (subsystem
callbacks and data), maybe a global list or a hash table. And then, we can
find the callbacks.
Maybe that is the simplest approach, but it's going to make get_user_pages()
slower and more complicated (as if it wasn't already). Maybe with all the
bells and whistles of per-cpu data structures and such you can make it work,
but I'm pretty sure someone running the large unmentionable benchmark will
complain about the performance regressions you're going to introduce. At
least in the case of the AIO ring buffer, using the address_space approach
doesn't introduce any new performance issues. There's also the bigger
question of if you can or cannot exclude get_user_pages_fast() from this.
In short: you've got a lot more work on your hands to do.
Thanks. :)
Cheers,
-ben