Re: Kernel patches enabling better POSIX AIO (Was Re: [3/4] kevent: AIO, aio_sendfile)

From: Suparna Bhattacharya
Date: Mon Aug 14 2006 - 02:59:03 EST

On Sat, Aug 12, 2006 at 12:10:35PM -0700, Ulrich Drepper wrote:
> Suparna Bhattacharya wrote:
> > I am wondering about that too. IIRC, the IO_NOTIFY_* constants are not
> > part of the ABI, but only internal to the kernel implementation. I think
> > Zach had suggested inferring THREAD_ID notification if the pid specified
> > is not zero. But, I don't see why ->sigev_notify couldn't used directly
> > (just like the POSIX timers code does) thus doing away with the
> > new constants altogether. Sebestian/Laurent, do you recall?
> I suggest to model the implementation after the timer code which does
> exactly what we need.


> > I'm guessing they are being used for validation of permissions at the time
> > of sending the signal, but maybe saving the task pointer in the iocb instead
> > of the pid would suffice ?
> Why should any verification be necessary? The requests are generated in
> the same process which will receive the notification. Even if the POSIX
> process (aka, kernel process group) changes the IDs the notifications
> should be set. The key is that notifications cannot be sent to another
> POSIX process.

Is there a (remote) possibility that the thread could have died and its
pid got reused by a new thread in another process ? Or is there a mechanism
that prevents such a possibility from arising (not just in NPTL library,
but at the kernel level) ?

I think the timer code saves a reference to the task pointer instead of
the pid, which is what I was suggesting above (instead of the euid checks),
as way to avoid the above situation.

> Adding this as a feature just makes things so much more complicated.
> > So I think the
> > intended behaviour is as you describe it should be
> Then the documentation needs to be adjusted.


> > The way it works (and better ideas are welcome) is that, since the io_submit()
> > syscall already accepts an array of iocbs[], no new syscall was introduced.
> > To implement lio_listio, one has to set up such an array, with the first iocb
> > in the array having the special (new) grouping opcode of IOCB_CMD_GROUP which
> > specifies the sigev notification to be associated with group completion
> > (a NULL value of the sigev notification pointer would imply equivalent of
> > LIO_WAIT).
> OK, this seems OK. We have to construct the iocb arrays dynamically anyway.
> > My thought here was that it should be possible to include M as a parameter
> > to the IOCB_CMD_GROUP opcode iocb, and thus incorporated in the lio control
> > block ... then whatever semantics are agreed upon can be implemented.
> If you have room for the parameter this is fine. For the beginning we
> can enforce the number to be the same as the total number of requests.

Sounds good.

> > Let us know what you think about the listio interface ... hopefully the
> > other issues are mostly simple to resolve.
> It should be fine and I would support adding all this assuming the
> normal file support (as opposed to direct I/O only) is added, too.

OK. I updated my patchset against 2618-rc3 just after OLS.

> But I have one last question: sockets, pipes and the like are already
> supported, right? If this is not the case we have a problem with the
> currently proposed lio_listio interface.

AIO for pipes should not be a problem - Chris Mason had a patch, so we can
just bring it up to the current levels, possibly with some additional

I'm not sure what would be the right thing to do for the sockets case. While
we could put together a patch for basic aio_read/write (based on the same
model used for files), given the whole ongoing kevent effort, its not yet
clear to me what would make the most sense ...

Ben had a patch to do a fallback to kernel threads for AIO operations that
are not yet supported natively. I had some concerns about the approach, but
I guess he had intended it as an interim path for cases like this.

Suggestions would be much appreciated ? DaveM, Ingo, Andrew ?


> --
> â Ulrich Drepper â Red Hat, Inc. â 444 Castro St â Mountain View, CA â

Suparna Bhattacharya (suparna@xxxxxxxxxx)
Linux Technology Center
IBM Software Lab, India

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at