Re: [ofa-general] Re: Demand paging for memory regions

From: Caitlin Bestler
Date: Thu Feb 14 2008 - 15:17:38 EST


On Thu, Feb 14, 2008 at 11:39 AM, Christoph Lameter <clameter@xxxxxxx> wrote:
> On Thu, 14 Feb 2008, Steve Wise wrote:
>
> > Note that for T3, this involves suspending _all_ rdma connections that are in
> > the same PD as the MR being remapped. This is because the driver doesn't know
> > who the application advertised the rkey/stag to. So without that knowledge,
> > all connections that _might_ rdma into the MR must be suspended. If the MR
> > was only setup for local access, then the driver could track the connections
> > with references to the MR and only quiesce those connections.
> >
> > Point being, it will stop probably all connections that an application is
> > using (assuming the application uses a single PD).
>
> Right but if the system starts reclaiming pages of the application then we
> have a memory shortage. So the user should address that by not running
> other apps concurrently. The stopping of all connections is still better
> than the VM getting into major trouble. And the stopping of connections in
> order to move the process memory into a more advantageous memory location
> (f.e. using page migration) or stopping of connections in order to be able
> to move the process memory out of a range of failing memory is certainly
> good.
>

In that spirit, there are two important aspects of a suspend/resume API that
would enable the memory manager to solve problems most effectively:

1) The device should be allowed flexibility to extend the scope of the suspend
to what it is capable of implementing -- rather than being forced
to say that
it does not support suspend/;resume merely because it does so at a different
granularity.

2) It is very important that users of this API understand that it is
only the RDMA
device handling of incoming packets and WQEs that is being suspended. The
peers are not suspended by this API, or even told that this end is
suspending.
Unless the suspend is kept *extremely* short there will be adverse impacts.
And "short" here is measured in network terms, not human terms. The blink
of any eye is *way* too long. Any external dependencies between "suspend"
and "resume" will probably mean that things will not work, especially if the
external entities involve a disk drive.

So suspend/resume to re-arrange pages is one thing. Suspend/resume to cover
swapping out pages so they can be reallocated is an exercise in futility. By the
time you resume the connections will be broken or at the minimum damaged.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/