Re: [openib-general] Re: [PATCH][RFC][0/4] InfiniBand userspace verbs implementation

From: Caitlin Bestler
Date: Tue Apr 26 2005 - 08:48:46 EST


On 4/25/05, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
> On Mon, Apr 25, 2005 at 05:02:36PM -0700, Roland Dreier wrote:
> > The idea is that applications manage the lifetime of pinned memory
> > regions. They can do things like post multiple I/O operations without
> > any page-walking overhead, or pass a buffer descriptor to a remote
> > host who will send data at some indeterminate time in the future. In
> > addition, InfiniBand has the notion of atomic operations, so a cluster
> > application may be using some memory region to implement a global lock.
> >
> > This might not be the most kernel-friendly design but it is pretty
> > deeply ingrained in the design of RDMA transports like InfiniBand and
> > iWARP (RDMA over IP).
>
> Actuallky, no it isn't. All these transports would work just fine with
> the mmap a character device to hand out memory from the kernel approach
> I told you to use multiple times and Andrew mentioned in this thread aswell.
> What doesn't work with that design are the braindead designed by comittee
> APIs in the RDMA world - but I don't think we should care about them too
> much.
>


RDMA registers and uses the memory the user specifies. That is why byte
granularity and multiple redundant registrations are explicitly specified.

The mechanism by which this requirement is implemented is of course
OS dependent. But the requirements are that the application specifies
what portion of their memory they want registered (or what set of physical
pages if they have sufficient privilege) and that request is either honored
or refused by a resource manager (one preferably as integrated with
general OS resource management as possible).

The other aspect is that remotely enabled memory regions and memory
windows most be enabled for hardware access for the duration of
the region or window -- indefinitely until process death or explicit
termination by the application layer.

Theoretically there is nothing in the wire protocols that requires source
buffers to be pinned indefinitely, but that is the only way any RDMA
interface has ever worked -- so "brain death" must be pretty widespread.

The fact that this problem must be solved for remotely accessible
buffers, and that for cluster applications like MPI there is no distinction
between buffers used for inbound messages and outbound messages,
might have something to do with this.

User verbs needs to deal with these actual Memory Registration requirements,
including the very real application need for Memory Windows. The solution
should map to existing OS controls as much as possible.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/