Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

From: Davide Libenzi
Date: Sat Mar 03 2007 - 13:47:26 EST


On Sat, 3 Mar 2007, Evgeniy Polyakov wrote:

> > You've to excuse me if my memory is bad, but IIRC the whole discussion
> > and loong benchmark feast born with you throwing a benchmark at Ingo
> > (with kevent showing a 1.9x performance boost WRT epoll), not with you
> > making any other point.
>
> So, how does it sound?
> "Threadlets are bad for IO because kevent is 2 times faster than epoll?"
>
> I said threadlets are bad for IO (and we agreed that both approaches
> shouldbe usedfor the maximum performance) because of rescheduling overhead -
> tasks are quite heavy structuresa to move around - even pt_regs copy
> takes more than event structure, but not because there is something in other
> galaxy which might work faster than another something in another galaxy.
> That was stupid even to think about that.

Evgeny, other folks on this thread read what you said, so let's not drag
this over.



> > And if you really feel raw about the single O(nready) loop that epoll
> > currently does, a new epoll_wait2 (or whatever) API could be used to
> > deliver the event directly into a userspace buffer [1], directly from the
> > poll callback, w/out extra delivery loops (IRQ/event->epoll_callback->event_buffer).
>
> > [1] From the epoll callback, we cannot sleep, so it's gonna be either an
> > mlocked userspace buffer, or some kernel pages mapped to userspace.
>
> Callbacks never sleep - they add event into list just like current
> implementation (maybe some lock must be changed from mutex to spinlock,
> I do not rememeber), main problem is binding to the file structure,
> which is heavy.

I was referring to dropping an event directly to a userspace buffer, from
the poll callback. If pages are not there, you might sleep, and you can't
since the wakeup function holds a spinlock on the waitqueue head while
looping through the waiters to issue the wakeup. Also, you don't know from
where the poll wakeup is called.
File binding heavy? The first, and by *far* biggest, source of events
inside an event collector, of someone that cares about scalability, are
sockets. And those are already files. Second would be AIO, and those (if
performance figures agrees) can be hosted inside syslets/threadlets.
Then you fall into the no-care category, where the extra 100 bytes do not
make a case against the ability of using it with an existing POSIX
infrastructure (poll/select).
BTW, Linus made a signalfd sketch code time ago, to deliver signals to an
fd. Code remained there and nobody cared. Question: Was it because
1) it had file bindings or 2) because nobody really cared to deliver
signals to an event collector?
And *if* later requirements come, you don't need to change the API by
adding an XXEVENT_SIGNAL_ADD or XXEVENT_TIMER_ADD, or creating a new
XXEVENT-only submission structure. You create an API that automatically
makes that new abstraction work with POSIX poll/select, and you get epoll
support for free. Without even changing a bit in the epoll API.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/