Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support

From: Ingo Molnar
Date: Tue Feb 13 2007 - 11:46:38 EST



* Alan <alan@xxxxxxxxxxxxxxxxxxx> wrote:

> > A syslet is executed opportunistically: i.e. the syslet subsystem
> > assumes that the syslet will not block, and it will switch to a
> > cachemiss kernel thread from the scheduler. This means that even a
>
> How is scheduler fairness maintained ? and what is done for resource
> accounting here ?

the async threads are as if the user created user-space threads - and
it's accounted (and scheduled) accordingly.

> > that the kernel fills and user-space clears. Waiting is done via the
> > sys_async_wait() system call. Completion can be supressed on a
> > per-atom
>
> They should be selectable as well iff possible.

basically arbitrary notification interfaces are supported. For example
if you add a sys_kill() call as the last syslet atom then this will
notify any waiter in sigwait().

or if you want to select(), just do it in the fds that you are
interested in, and the write that the syslet does triggers select()
completion.

but the fastest one will be by using syslets: to just check the
notification ring pointer in user-space, and then call into
sys_async_wait() if the ring is empty.

I just noticed a small bug here: sys_async_wait() should also take the
ring index userspace checked as a second parameter, and fix up the
number of events it waits for with the delta between the ring index the
kernel maintains and the ring index user-space has. The patch below
fixes this bug.

> > Open issues:
>
> Let me add some more
>
> sys_setuid/gid/etc need to be synchronous only and not occur
> while other async syscalls are running in parallel to meet current
> kernel assumptions.

these should probably be taken out of the 'async syscall table', along
with fork and the async syscalls themselves.

> sys_exec and other security boundaries must be synchronous
> only and not allow async "spill over" (consider setuid async binary
> patching)

i've tested sys_exec() and it seems to work, but i might have missed
some corner-cases. (And what you raise is not academic, it might even
make sense to do it, in the vfork() way.)

> > - sys_fork() and sys_async_exec() should be filtered out from the
> > syscalls that are allowed - first one only makes sense with ptregs,
>
> clone and vfork. async_vfork is a real mindbender actually.

yeah. Also, create_module() perhaps. I'm starting to lean towards an
async_syscall_table[]. At which point we could reduce the max syslet
parameter count to 4, and do those few 5 and 6 parameter syscalls (of
which only splice() and futex() truly matter i suspect) via wrappers.
This would fit a syslet atom into 32 bytes on x86. Hm?

> > second one is a nice kernel recursion thing :) I didnt want to
> > duplicate the sys_call_table though - maybe others have a better
> > idea.
>
> What are the semantics of async sys_async_wait and async sys_async ?

agreed, that should be forbidden too.

Ingo

---------------------->
---
kernel/async.c | 12 +++++++++---
kernel/async.h | 2 +-
2 files changed, 10 insertions(+), 4 deletions(-)

Index: linux/kernel/async.c
===================================================================
--- linux.orig/kernel/async.c
+++ linux/kernel/async.c
@@ -721,7 +721,8 @@ static void refill_cachemiss_pool(struct
* to finish or for all async processing to finish (whichever
* comes first).
*/
-asmlinkage long sys_async_wait(unsigned long min_wait_events)
+asmlinkage long
+sys_async_wait(unsigned long min_wait_events, unsigned long user_curr_ring_idx)
{
struct async_head *ah = current->ah;

@@ -730,12 +731,17 @@ asmlinkage long sys_async_wait(unsigned

if (min_wait_events) {
spin_lock(&ah->lock);
- ah->events_left = min_wait_events;
+ /*
+ * Account any completions that happened since user-space
+ * checked the ring:
+ */
+ ah->events_left = min_wait_events -
+ (ah->curr_ring_idx - user_curr_ring_idx);
spin_unlock(&ah->lock);
}

return wait_event_interruptible(ah->wait,
- list_empty(&ah->busy_async_threads) || !ah->events_left);
+ list_empty(&ah->busy_async_threads) || ah->events_left > 0);
}

/**
Index: linux/kernel/async.h
===================================================================
--- linux.orig/kernel/async.h
+++ linux/kernel/async.h
@@ -26,7 +26,7 @@ struct async_head {
struct list_head ready_async_threads;
struct list_head busy_async_threads;

- unsigned long events_left;
+ long events_left;
wait_queue_head_t wait;

struct async_head_user __user *uah;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/