Re: Review request: draft userfaultfd(2) manual page

From: Mike Rapoport
Date: Fri Apr 21 2017 - 07:07:17 EST


On Fri, Apr 21, 2017 at 08:30:55AM +0200, Michael Kerrisk (man-pages) wrote:
> Hello Mike,
>
> On 03/21/2017 03:01 PM, Mike Rapoport wrote:
> > Hello Michael,
> >
> > On Mon, Mar 20, 2017 at 09:08:05PM +0100, Michael Kerrisk (man-pages) wrote:
> >> Hello Andrea, Mike, and all,
> >>
> >> Mike: thanks for the page that you sent. I've reworked it
> >> a bit, and also added a lot of further information,
> >> and an example program. In the process, I split the page
> >> into two pieces, with one piece describing the userfaultfd()
> >> system call and the other describing the ioctl() operations.
> >>
> >> I'd like to get review input, especially from you and
> >> Andrea, but also anyone else, for the current version
> >> of this page, which includes a few FIXMEs to be sorted.
> >
> > Thanks for the update. I'm adressing the FIXME points you've mentioned
> > below.
>
> Thanks!
>
> > Otherwise, everything seems the right description of the current upstream.
> > 4.11 will have quite a few updates to userfault and we'll need to udpate
> > this page and ioctl_userfaultfd(2) to address those updates. I am planning
> > to work on the man update in the next few weeks.
> >
> >> I've shown the rendered version of the page below.
> >> The groff source is attached, and can also be found
> >> at the branch here:
> >
> >> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=draft_userfaultfd
> >>
> >> The new ioctl_userfaultfd(2) page follows this mail.
> >>
> >> Cheers,
> >>
> >> Michael
> >
> > --
> > Sincerely yours,
> > Mike.
> >
> >
> >> USERFAULTFD(2) Linux Programmer's Manual USERFAULTFD(2)
> >>
> >> âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
> >> âFIXME â
> >> âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
> >> âNeed to describe close(2) semantics for userfaulfd â
> >> âfile descriptor: what happens when the userfaultfd â
> >> âFD is closed? â
> >> â â
> >> âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
> >
> > When userfaultfd is closed, it unregisters all memory ranges that were
> > previously registered with it and flushes the outstanding page fault
> > events.
>
> Presumably, this is more precisely stated as, "when the last
> file descriptor referring to a userfaultfd object is closed..."?

You are right.

> I've made the text:
>
> When the last file descriptor referring to a userfaultfd object
> is closed, all memory ranges that were registered with the
> object are unregistered and unread page-fault events are
> flushed.
>
> [...]

Perfect.

> >> Reading from the userfaultfd structure
> >> âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
> >> âFIXME â
> >> âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
> >> âare the details below correct? â
> >> âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
> >
> > Yes, at least for the current upstream version. 4.11 will have quite a few
> > updates to userfaultfd.
>
> Okay.
>
> >> Each read(2) from the userfaultfd file descriptor returns one
> >> or more uffd_msg structures, each of which describes a page-
> >> fault event:
> >>
> >> struct uffd_msg {
> >> __u8 event; /* Type of event */
> >> ...
> >> union {
> >> struct {
> >> __u64 flags; /* Flags describing fault */
> >> __u64 address; /* Faulting address */
> >> } pagefault;
> >> ...
> >> } arg;
> >>
> >> /* Padding fields omitted */
> >> } __packed;
> >>
> >> If multiple events are available and the supplied buffer is
> >> large enough, read(2) returns as many events as will fit in the
> >> supplied buffer. If the buffer supplied to read(2) is smaller
> >> than the size of the uffd_msg structure, the read(2) fails with
> >> the error EINVAL.
> >>
> >> The fields set in the uffd_msg structure are as follows:
> >>
> >> event The type of event. Currently, only one value can appear
> >> in this field: UFFD_EVENT_PAGEFAULT, which indicates a
> >> page-fault event.
> >>
> >> address
> >> The address that triggered the page fault.
> >>
> >> flags A bit mask of flags that describe the event. For
> >> UFFD_EVENT_PAGEFAULT, the following flag may appear:
> >>
> >> UFFD_PAGEFAULT_FLAG_WRITE
> >> If the address is in a range that was registered
> >> with the UFFDIO_REGISTER_MODE_MISSING flag (see
> >> ioctl_userfaultfd(2)) and this flag is set, this
> >> a write fault; otherwise it is a read fault.
> >>
> >> A read(2) on a userfaultfd file descriptor can fail with the
> >> following errors:
> >>
> >> EINVAL The userfaultfd object has not yet been enabled using
> >> the UFFDIO_API ioctl(2) operation
> >>
> >> The userfaultfd file descriptor can be monitored with poll(2),
> >> select(2), and epoll(7). When events are available, the file
> >> descriptor indicates as readable.
> >>
> >>
> >> âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
> >> âFIXME â
> >> âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
> >> âBut, it seems, the object must be created with â
> >> âO_NONBLOCK. What is the rationale for this requireâ â
> >> âment? Something needs to be said in this manual â
> >> âpage. â
> >> âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
> >
> > The object can be created without O_NONBLOCK, so probably the above
> > sentence can be rephrased as:
> >
> > When the userfaultfd file descriptor is opened in non-blocking mode, it can
> > be monitored with ...
>
> Yes, but why is there this requirement for poll() etc. with the
> O_NONBLOCK flag? I think something about that needs to be said in the
> man page. Sorry, my FIXME was not clear enough. I've reworded the text
> and the FIXME:
>
> If the O_NONBLOCK flag is enabled in the associated open file
> description, the userfaultfd file descriptor can be monitored
> with poll(2), select(2), and epoll(7). When events are availâ
> able, the file descriptor indicates as readable. If the O_NONâ
> BLOCK flag is not enabled, then poll(2) (always) indicates the
> file as having a POLLERR condition, and select(2) indicates the
> file descriptor as both readable and writable.
>
> âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
> âFIXME â
> âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
> âWhat is the reason for this seemingly odd behavior â
> âwith respect to the O_NONBLOCK flag? (see userâ â
> âfaultfd_poll() in fs/userfaultfd.c). Something â
> âneeds to be said about this. â
> âââââââââââââââââââââââââââââââââââââââââââââââââââââââ

Andrea, can you please help with this one as well?

> [...]
>
> Thanks,
>
> Michael
>
> --
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/

--
Sincerely yours,
Mike.