On 5/16/19 2:57 AM, Roman Penyaev wrote:
Hi all,
This is v3 which introduces pollable epoll from userspace.
v3:
- Measurements made, represented below.
- Fix alignment for epoll_uitem structure on all 64-bit archs except
x86-64. epoll_uitem should be always 16 bit, proper BUILD_BUG_ON
is added. (Linus)
- Check pollflags explicitly on 0 inside work callback, and do nothing
if 0.
v2:
- No reallocations, the max number of items (thus size of the user ring)
is specified by the caller.
- Interface is simplified: -ENOSPC is returned on attempt to add a new
epoll item if number is reached the max, nothing more.
- Alloced pages are accounted using user->locked_vm and limited to
RLIMIT_MEMLOCK value.
- EPOLLONESHOT is handled.
This series introduces pollable epoll from userspace, i.e. user creates
epfd with a new EPOLL_USERPOLL flag, mmaps epoll descriptor, gets header
and ring pointers and then consumes ready events from a ring, avoiding
epoll_wait() call. When ring is empty, user has to call epoll_wait()
in order to wait for new events. epoll_wait() returns -ESTALE if user
ring has events in the ring (kind of indication, that user has to consume
events from the user ring first, I could not invent anything better than
returning -ESTALE).
For user header and user ring allocation I used vmalloc_user(). I found
that it is much easy to reuse remap_vmalloc_range_partial() instead of
dealing with page cache (like aio.c does). What is also nice is that
virtual address is properly aligned on SHMLBA, thus there should not be
any d-cache aliasing problems on archs with vivt or vipt caches.
Why aren't we just adding support to io_uring for this instead? Then we
don't need yet another entirely new ring, that's is just a little
different from what we have.
I haven't looked into the details of your implementation, just curious
if there's anything that makes using io_uring a non-starter for this
purpose?