Re: [ofa-general] Re: Demand paging for memory regions

From: Kanoj Sarcar
Date: Wed Feb 13 2008 - 18:43:40 EST



--- Christoph Lameter <clameter@xxxxxxx> wrote:

> On Wed, 13 Feb 2008, Kanoj Sarcar wrote:
>
> > It seems that the need is to solve potential
> memory
> > shortage and overcommit issues by being able to
> > reclaim pages pinned by rdma driver/hardware. Is
> my
> > understanding correct?
>
> Correct.
>
> > If I do understand correctly, then why is rdma
> page
> > pinning any different than eg mlock pinning? I
> imagine
> > Oracle pins lots of memory (using mlock), how come
> > they do not run into vm overcommit issues?
>
> Mlocked pages are not pinned. They are movable by
> f.e. page migration and
> will be potentially be moved by future memory defrag
> approaches. Currently
> we have the same issues with mlocked pages as with
> pinned pages. There is
> work in progress to put mlocked pages onto a
> different lru so that reclaim
> exempts these pages and more work on limiting the
> percentage of memory
> that can be mlocked.
>
> > Are we up against some kind of breaking c-o-w
> issue
> > here that is different between mlock and rdma
> pinning?
>
> Not that I know.
>
> > Asked another way, why should effort be spent on a
> > notifier scheme, and rather not on fixing any
> memory
> > accounting problems and unifying how pin pages are
> > accounted for that get pinned via mlock() or rdma
> > drivers?
>
> There are efforts underway to account for and limit
> mlocked pages as
> described above. Page pinning the way it is done by
> Infiniband through
> increasing the page refcount is treated by the VM as
> a temporary
> condition not as a permanent pin. The VM will
> continually try to reclaim
> these pages thinking that the temporary usage of the
> page must cease
> soon. This is why the use of large amounts of pinned
> pages can lead to
> livelock situations.

Oh ok, yes, I did see the discussion on this; sorry I
missed it. I do see what notifiers bring to the table
now (without endorsing it :-)).

An orthogonal question is this: is IB/rdma the only
"culprit" that elevates page refcounts? Are there no
other subsystems which do a similar thing?

The example I am thinking about is rawio (Oracle's
mlock'ed SHM regions are handed to rawio, isn't it?).
My understanding of how rawio works in Linux is quite
dated though ...

Kanoj

>
> If we want to have pinning behavior then we could
> mark pinned pages
> specially so that the VM will not continually try to
> evict these pages. We
> could manage them similar to mlocked pages but just
> not allow page
> migration, memory unplug and defrag to occur on
> pinned memory. All of
> theses would have to fail. With the notifier scheme
> the device driver
> could be told to get rid of the pinned memory. This
> would make these 3
> techniques work despite having an RDMA memory
> section.
>
> > Startup benefits are well understood with the
> notifier
> > scheme (ie, not all pages need to be faulted in at
> > memory region creation time), specially when most
> of
> > the memory region is not accessed at all. I would
> > imagine most of HPC does not work this way though.
>
> No for optimal performance you would want to
> prefault all pages like
> it is now. The notifier scheme would only become
> relevant in memory
> shortage situations.
>
> > Then again, as rdma hardware is applied
> (increasingly?) towards apps
> > with short lived connections, the notifier scheme
> will help with startup
> > times.
>
> The main use of the notifier scheme is for stability
> and reliability. The
> "pinned" pages become unpinnable on request by the
> VM. So the VM can work
> itself out of memory shortage situations in
> cooperation with the
> RDMA logic instead of simply failing.
>
> --
> To unsubscribe, send a message with 'unsubscribe
> linux-mm' in
> the body to majordomo@xxxxxxxxxx For more info on
> Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@xxxxxxxxx";>
> email@xxxxxxxxx </a>
>



____________________________________________________________________________________
Looking for last minute shopping deals?
Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/