Re: [PATCH v2] ummunotify: Userspace support for MMU notifications

From: Andrew Morton
Date: Mon Jul 27 2009 - 19:53:41 EST


On Fri, 24 Jul 2009 15:56:17 -0700
Roland Dreier <rdreier@xxxxxxxxx> wrote:

> As discussed in <http://article.gmane.org/gmane.linux.drivers.openib/61925>
> and follow-up messages, libraries using RDMA would like to track
> precisely when application code changes memory mapping via free(),
> munmap(), etc. Current pure-userspace solutions using malloc hooks
> and other tricks are not robust, and the feeling among experts is that
> the issue is unfixable without kernel help.
>
> We solve this not by implementing the full API proposed in the email
> linked above but rather with a simpler and more generic interface,
> which may be useful in other contexts. Specifically, we implement a
> new character device driver, ummunotify, that creates a /dev/ummunotify
> node. A userspace process can open this node read-only and use the fd
> as follows:
>
> 1. ioctl() to register/unregister an address range to watch in the
> kernel (cf struct ummunotify_register_ioctl in <linux/ummunotify.h>).
>
> 2. read() to retrieve events generated when a mapping in a watched
> address range is invalidated (cf struct ummunotify_event in
> <linux/ummunotify.h>). select()/poll()/epoll() and SIGIO are
> handled for this IO.
>
> 3. mmap() one page at offset 0 to map a kernel page that contains a
> generation counter that is incremented each time an event is
> generated. This allows userspace to have a fast path that checks
> that no events have occurred without a system call.
>
> Thanks to Jason Gunthorpe <jgunthorpe@xxxxxxxxxxxxxxxxxxxx> for
> suggestions on the interface design. Also thanks to Jeff Squyres
> <jsquyres@xxxxxxxxx> for prototyping support for this in Open MPI, which
> helped find several bugs during development.
>
> ...
>
> +config UMMUNOTIFY
> + tristate "Userspace MMU notifications"
> + select MMU_NOTIFIER
> + help
> + The ummunotify (userspace MMU notification) driver creates a
> + character device that can be used by userspace libraries to
> + get notifications when an application's memory mapping
> + changed. This is used, for example, by RDMA libraries to
> + improve the reliability of memory registration caching, since
> + the kernel's MMU notifications can be used to know precisely
> + when to shoot down a cached registration.

Does `select' dtrt here if UMMUNOTIFY=m? I never trust it...

<searches in vain for ummunotify.txt>

Oh well :(

A little test app would be nice - I assume you have one. We could toss
in in the tree as a how-to-use example, and people could perhaps turn
it into a regression test - perhaps the LTP people would take it.

>
> ...
>
> + if (test_bit(UMMUNOTIFY_FLAG_HINT, &reg->flags)) {
> + clear_bit(UMMUNOTIFY_FLAG_HINT, &reg->flags);
> + } else {
> + set_bit(UMMUNOTIFY_FLAG_HINT, &reg->flags);

It's a shame that change_bit() didn't return the old (or new) value.



The overall userspace interface seems a bit klunky, but I can't really
suggest anything better. Netlink delivery?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/