Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

From: Evgeniy Polyakov
Date: Sat Mar 03 2007 - 05:09:53 EST


On Fri, Mar 02, 2007 at 09:13:40AM -0800, Davide Libenzi (davidel@xxxxxxxxxxxxxxx) wrote:
> On Fri, 2 Mar 2007, Evgeniy Polyakov wrote:
>
> > On Thu, Mar 01, 2007 at 11:31:14AM -0800, Davide Libenzi (davidel@xxxxxxxxxxxxxxx) wrote:
> > > On Thu, 1 Mar 2007, Evgeniy Polyakov wrote:
> > >
> > > > Ingo, do you really think I will send mails with faked benchmarks? :))
> > >
> > > I don't think he ever implied that. He was only suggesting that when you
> > > post benchmarks, and even more when you make claims based on benchmarks,
> > > you need to be extra carefull about what you measure. Otherwise the
> > > external view that you give to others does not look good.
> > > Kevent can be really faster than epoll, but if you post broken benchmarks
> > > (that can be, unrealiable HTTP loaders, broken server implemenations,
> > > etc..) and make claims based on that, the only effect that you have is to
> > > lose your point.
> >
> > So, I only talked that kevent is superior compared to epoll because (and
> > it is _main_ issue) of its ability to handle essentially any kind of
> > events with very small overhead (the same as epoll has in struct file -
> > list and spinlock) and without significant price of struct file binding
> > to event.
>
> You've to excuse me if my memory is bad, but IIRC the whole discussion
> and loong benchmark feast born with you throwing a benchmark at Ingo
> (with kevent showing a 1.9x performance boost WRT epoll), not with you
> making any other point.

So, how does it sound?
"Threadlets are bad for IO because kevent is 2 times faster than epoll?"

I said threadlets are bad for IO (and we agreed that both approaches
shouldbe usedfor the maximum performance) because of rescheduling overhead -
tasks are quite heavy structuresa to move around - even pt_regs copy
takes more than event structure, but not because there is something in other
galaxy which might work faster than another something in another galaxy.
That was stupid even to think about that.

> As far as epoll not being able to handle other events. Said who? Of
> course, with zero modifications, you can handle zero additional events.
> With modifications, you can handle other events. But lets talk about those
> other events. The *only* kind of event that ppl (and being the epoll
> maintainer I tend to receive those requests) missed in epoll, was AIO
> events, That's the *only* thing that was missed by real life application
> developers. And if something like threadlets/syslets will prove effective,
> the gap is closed WRT that requirement.
> Epoll handle already the whole class of pollable devices inside the
> kernel, and if you exclude block AIO, that's a pretty wide class already.
> The *existing* f_op->poll subsystem can be used to deliver events at the
> poll-head wakeup time (by using the "key" member of the poll callback), so
> that you don't even need the extra f_op->poll call to fetch events.
> And if you really feel raw about the single O(nready) loop that epoll
> currently does, a new epoll_wait2 (or whatever) API could be used to
> deliver the event directly into a userspace buffer [1], directly from the
> poll callback, w/out extra delivery loops (IRQ/event->epoll_callback->event_buffer).

Signals, futexes, timers and userspace events I was requested to add into
kevent, so far only futexes are missed because I was asked to freeze
development so other hackers could check the project.

>
> [1] From the epoll callback, we cannot sleep, so it's gonna be either an
> mlocked userspace buffer, or some kernel pages mapped to userspace.

Callbacks never sleep - they add event into list just like current
implementation (maybe some lock must be changed from mutex to spinlock,
I do not rememeber), main problem is binding to the file structure,
which is heavy.

> - Davide
>

--
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/