Re: [RFC PATCH 09/15] epoll: introduce stand-alone helpers for polling from userspace

From: Roman Penyaev
Date: Thu Jan 10 2019 - 05:03:55 EST


On 2019-01-09 18:29, Linus Torvalds wrote:
On Wed, Jan 9, 2019 at 8:40 AM Roman Penyaev <rpenyaev@xxxxxxx> wrote:

ep_vrealloc*()
realloc user header, user index or bitmap memory

What? No.

This is wrong, it's much too complicated. And because your
'vrealloc()' doesn't follow the normal realloc rules, it looks both
confusing and buggy, and people have to remember that "oh, vrealloc()
isn't actually vrealloc(), it's really vdupalloc()".

Your other patch to allow users to apparently also do mremap of these
things seems entirely wrongheaded too. Especially when you then have
magical rules for vm_pgoff, which is one of the things that unmapping
parts of a mmap will touch.

So I say no, no, no. This is all *much* too complicated, and the
interfaces are mis-designed to be overly generous to people doing odd
and pointless things.

If you can't have a fixed-size user buffer that stays in one place,
don't even bother.

I agree that set of "rules" for this interface is indeed complicated.
The goal was to solve the problem with a constantly changing set of
items (which can be increased / decreased from another thread) without
adding new ctl calls or any limitations.

To fix the size of a user buffer is seems easy to do. One way is still
to support expand with, say, epoll_ctl(EPOLL_CTL_EXPAND) call and user
has to react explicitly on ENOSPC from epoll_ctl(EPOLL_CTL_ADD). Thus
reallocation happens, but by user request.

Another way seems much simpler but has a limitation: user has to specify
expected max limit passing the value to a new epoll_create syscall, e.g.
epoll_create2(EPOLL_USERPOLL, 1000). Further attempt to add 1001 descriptor
will end with ENOSPC. Period. No magic under the hood. Another 1001
descriptor can be added to a new epoll, which can be nested then (what
is forbidden for "polled from user" descriptors in current implementation,
but should not be difficult to allow). Then yes, no remapping / reallocating.
But this epoll nesting thing ... Which personally I do not like.

What do you think?

--
Roman