Re: [PATCH 2 of 4] Introduce i386 fibril scheduling

From: Kent Overstreet
Date: Tue Feb 06 2007 - 17:46:11 EST

Next message: Zachary Amsden: "Re: [PATCH 2/11] Sched clock paravirt op"
Previous message: Randy Dunlap: "Re: [patch] MTD: fix DOC2000/2001/2001PLUS build error"
In reply to: Zach Brown: "Re: [PATCH 2 of 4] Introduce i386 fibril scheduling"
Next in thread: Linus Torvalds: "Re: [PATCH 2 of 4] Introduce i386 fibril scheduling"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2/6/07, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

On Mon, 5 Feb 2007, Kent Overstreet wrote:
>
> struct asys_ret {
> int ret;
> struct thread *p;
> };
>
> struct asys_ret r;
> r.p = me;
>
> async_read(fd, buf, nbytes, &r);

That's horrible. It means that "r" cannot have automatic linkage (since
the stack will be *gone* by the time we need to fill in "ret"), so now you
need to track *two* pointers: "me" and "&r".

You'd only allocate r on the stack if that stack is going to be around
later; i.e. if you're using user threads. Otherwise, you just allocate
it in some struct containing your aiocb or whatever.

And for user space, it means that we pass the _one_ thing around that we
need for both identifying the async operation to the kernel (the "cookie")
for wait or cancel, and the place where we expect the return value to be
found (which in turn can _easily_ represent a whole "struct aiocb *",
since the return value obviously has to be embedded in there anyway).

Linus

The "struct aiocb" isn't something you have to or necessarily want to
keep around. It's the way the current aio interface works (which I've
coded to), but I don't really see the point. All it really contains is
the syscall arguments, but once the syscall's in progress there's no
reason the kernel has to refer back to it; similarly for userspace,
it's just another struct that userspace has to keep track of and free
at some later time.

In fact, that's the only sane way you can have a ring for submitted
system calls, as otherwise elements of the ring are getting freed in
essentially random order.

I don't see the point in having a ring for completed events, since
it's at most two pointers per completion; quite a bit less data being
sent back than for submissions.

-----

The trouble with differentiating between calls that block and calls
that don't is you completely loose the ability to batch syscalls
together; this is potentially a major win of an asynchronous
interface.

An app can have a bunch of cheap, fast user space threads servicing
whatever; as they run, they can push their system calls onto a global
stack. When no more can run, it does a giant asys_submit (something
similar to io_submit), then the io_getevents equivilant, running the
user threads that had their syscalls complete.

This doesn't mean you can't run synchronously the syscalls that
wouldn't block, or that you have to allocate a fibril for every
syscall - but for servers that care more about throughput than
latency, this is potentially a big win, in cache effects if nothing
else.

(And this doesn't prevent you from having a different syscall that
submits an asynchronous syscall, but runs it right away if it was able
to without blocking).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Zachary Amsden: "Re: [PATCH 2/11] Sched clock paravirt op"
Previous message: Randy Dunlap: "Re: [patch] MTD: fix DOC2000/2001/2001PLUS build error"
In reply to: Zach Brown: "Re: [PATCH 2 of 4] Introduce i386 fibril scheduling"
Next in thread: Linus Torvalds: "Re: [PATCH 2 of 4] Introduce i386 fibril scheduling"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]