Re: [ofa-general] Re: Demand paging for memory regions

From: Christoph Lameter
Date: Wed Feb 13 2008 - 18:02:49 EST


On Wed, 13 Feb 2008, Kanoj Sarcar wrote:

> It seems that the need is to solve potential memory
> shortage and overcommit issues by being able to
> reclaim pages pinned by rdma driver/hardware. Is my
> understanding correct?

Correct.

> If I do understand correctly, then why is rdma page
> pinning any different than eg mlock pinning? I imagine
> Oracle pins lots of memory (using mlock), how come
> they do not run into vm overcommit issues?

Mlocked pages are not pinned. They are movable by f.e. page migration and
will be potentially be moved by future memory defrag approaches. Currently
we have the same issues with mlocked pages as with pinned pages. There is
work in progress to put mlocked pages onto a different lru so that reclaim
exempts these pages and more work on limiting the percentage of memory
that can be mlocked.

> Are we up against some kind of breaking c-o-w issue
> here that is different between mlock and rdma pinning?

Not that I know.

> Asked another way, why should effort be spent on a
> notifier scheme, and rather not on fixing any memory
> accounting problems and unifying how pin pages are
> accounted for that get pinned via mlock() or rdma
> drivers?

There are efforts underway to account for and limit mlocked pages as
described above. Page pinning the way it is done by Infiniband through
increasing the page refcount is treated by the VM as a temporary
condition not as a permanent pin. The VM will continually try to reclaim
these pages thinking that the temporary usage of the page must cease
soon. This is why the use of large amounts of pinned pages can lead to
livelock situations.

If we want to have pinning behavior then we could mark pinned pages
specially so that the VM will not continually try to evict these pages. We
could manage them similar to mlocked pages but just not allow page
migration, memory unplug and defrag to occur on pinned memory. All of
theses would have to fail. With the notifier scheme the device driver
could be told to get rid of the pinned memory. This would make these 3
techniques work despite having an RDMA memory section.

> Startup benefits are well understood with the notifier
> scheme (ie, not all pages need to be faulted in at
> memory region creation time), specially when most of
> the memory region is not accessed at all. I would
> imagine most of HPC does not work this way though.

No for optimal performance you would want to prefault all pages like
it is now. The notifier scheme would only become relevant in memory
shortage situations.

> Then again, as rdma hardware is applied (increasingly?) towards apps
> with short lived connections, the notifier scheme will help with startup
> times.

The main use of the notifier scheme is for stability and reliability. The
"pinned" pages become unpinnable on request by the VM. So the VM can work
itself out of memory shortage situations in cooperation with the
RDMA logic instead of simply failing.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/