Re: [PATCH][RFC][0/4] InfiniBand userspace verbs implementation

From: Andrew Morton
Date: Mon Apr 25 2005 - 17:18:18 EST


Roland Dreier <roland@xxxxxxxxxxx> wrote:
>
> Andrew> Do we care about that? A straightforward scenario under
> Andrew> which this can happen is:
>
> Andrew> a) app starts some read I/O in an asynchronous manner
> Andrew> b) app forks
> Andrew> c) child writes to one of the pages which is still under read I/O
> Andrew> d) the read I/O completes
> Andrew> e) the child is left with the old data plus the child's modification instead
> Andrew> of the new data
>
> Andrew> which is a very silly application which is giving itself
> Andrew> unpredictable memory contents anyway.
>
> Andrew> I assume there's a more sensible scenario?
>
> You're right, that is a silly scenario ;) In fact if we mark vmas
> with VM_DONTCOPY, then the child just crashes with a seg fault.
>
> The type of thing I'm worried about is something like, for example:
>
> a) app registers memory region with RDMA hardware -- in other words,
> loads the device's translation table for future I/O

Whoa, hang on.

The way we expect get_user_pages() to be used is that the kernel will use
get_user_pages() once per application I/O request.

Are you saying that RDMA clients will semi-permanently own pages which were
pinned by get_user_pages()? That those pages will be used for multiple
separate I/O operations?

If so, then that's a significant design departure and it would be good to
hear why it is necessary.

> b) app forks
> c) app writes to the registered memory region, and the kernel breaks
> the COW for the (now read-only) page by mapping a new page
> d) app starts an I/O that will do a DMA read from the region
> e) device reads using the wrong, old mapping

Sure. But such an app could be declared to be buggy...

> This can be pretty insiduous because for example fork() + immediate
> exec() or just using system() still leaves the parent with PTEs marked
> read-only. If an application does overlapping memory registrations so
> get_user_pages() is called a lot, then as far as I can see
> can_share_swap_page() will always return 0 and the COW will happen
> even if the child process has thrown out its original vmas.
>
> Or if the counts are in the correct range, then there's a small window
> between fork() and exec() where the parent process can screw itself
> up, so most of the time the app works, until it doesn't.
>
> - R.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/