Re: [PATCH/RFC G-U-P experts] IB/umem: Modernize our get_user_pages() parameters

From: Roland Dreier
Date: Mon Jan 30 2012 - 14:16:25 EST


On Fri, Jan 27, 2012 at 6:19 PM, Hugh Dickins <hughd@xxxxxxxxxx> wrote:
>> > > This patch comes from me trying to do userspace RDMA on a memory
>> > > region exported from a character driver and mapped with
>> > >
>> > >     mmap(... PROT_READ, MAP_PRIVATE ...)
>
> Why MAP_PRIVATE?  There you are explicitly asking for COW: okay,
> you wouldn't normally expect any COW while it's just PROT_READ, but
> once you bring GUP into the picture, with use of write and force,
> then you are just begging for COW with that MAP_PRIVATE.  Please
> change it to MAP_SHARED - any reason why not?

I have no idea of the history there... probably could be changed with
no problems.

However, get_user_pages has this comment:

* @force: whether to force write access even if user mapping is
* readonly. This will result in the page being COWed even
* in MAP_SHARED mappings. You do not want this.

but I don't see where in the code FOLL_FORCE does COW
for MAP_SHARED mappings. But on the OTOH I don't see
why we set force in the first place. Why wouldn't we respect
the user's mapping permissions.

> I feel you're trying to handle two very different cases (rdma into
> user-supplied anonymous memory, and exporting driver memory to the
> user) with the same set of args to get_user_pages().  In fact, I
> don't even see why you need get_user_pages() at all when exporting
> driver memory to the user.  Ah, perhaps you don't, but you do want
> your standard access method (which already involves GUP) not to
> mess up when applied to such a mapping - is that it?

Exactly. Right now we have the libibverbs userspace API, which
basically lets userspace create an abstract "memory region" (MR)
that is then given to the RDMA hardware to do IO on. Userspace does

mr = ibv_reg_mr(..., buf, size, access_flags);

where access flags say whether we're going to let the hardware
read and/or write the memory.

Ideally userspace should not have to know where the memory
underlying its "buf" came from or what type of mapping it is.

Certainly there are still more unresolved issues around the case
where userspace wants to map, say, part of a GPUs PCI memory
(which won't have any underlying page structs at all), but I'm at
least hoping we can come up with a way to handle both anonymous
private maps (which will be COWed from the zero page when
the memory is touched for writing) and shared mappings of kernel
memory exported by a driver's mmap method.


So I guess I'm left thinking that it seems at least plausible that
what we want is a new FOLL_ flag for __get_user_pages() that triggers
COW exactly on the pages that userspace might trigger COW on,
and avoids COW otherwise -- ie do FOLL_WRITE exactly for the
pages that have VM_WRITE in their mapping.

I don't think we want to do the "force" semantics or deal with the
VM_MAYWRITE possiblity -- the access we give the hardware on
behalf of userspace should just match the access that userspace
actually has. It seems that if we don't try to get pages for writing
when VM_WRITE isn't set, we don't need force anymore.

- R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/