Re: [3/4] kevent: AIO, aio_sendfile() implementation.

From: Ulrich Drepper
Date: Fri Aug 11 2006 - 15:45:58 EST

SÃbastien Duguà wrote:
> aio completion notification

I looked over this now but I don't think I understand everything. Or I
don't see how it all is integrated. And no, I'm not looking at the
proposed glibc code since would mean being tainted.

> Details:
> -------
> A struct sigevent *aio_sigeventp is added to struct iocb in
> include/linux/aio_abi.h
> An enum {IO_NOTIFY_SIGNAL = 0, IO_NOTIFY_THREAD_ID = 1} is added in
> include/linux/aio.h:
> - IO_NOTIFY_SIGNAL means that the signal is to be sent to the
> requesting thread
> - IO_NOTIFY_THREAD_ID means that the signal is to be sent to a
> specifi thread.

This has been proved to be sufficient in the timer code which basically
has the same problem. But why do you need separate constants? We have
the various SIGEV_* constants, among them SIGEV_THREAD_ID. Just use
these constants for the values of ki_notify.

> The following fields are added to struct kiocb in include/linux/aio.h:
> - pid_t ki_pid: target of the signal
> - __u16 ki_signo: signal number
> - __u16 ki_notify: kind of notification, IO_NOTIFY_SIGNAL or
> - uid_t ki_uid, ki_euid: filled with the submitter credentials

These two fields aren't needed for the POSIX interfaces. Where does the
requirement come from? I don't say they should be removed, they might
be useful, but if the costs are non-negligible then they could go away.

> - check whether the submitting thread wants to be notified directly
> (sigevent->sigev_notify_thread_id is 0) or wants the signal to be sent
> to another thread.
> In the latter case a check is made to assert that the target thread
> is in the same thread group

Is this really how it's implemented? This is not how it should be.
Either a signal is sent to a specific thread in the same process (this
is what SIGEV_THREAD_ID is for) or the signal is sent to a calling
process. Sending a signal to the process means that from the kernel's
POV any thread which doesn't have the signal blocked can receive it.
The final decision is made by the kernel. There is no mechanism to send
the signal to another process.

So, for the purpose of the POSIX AIO code the ki_pid value is only
needed when the SIGEV_THREAD_ID bit is set.

It could be an extension and I don't mind it being introduced. But
again, it's not necessary and if it adds costs then it could be left
out. It is something which could easily be introduced later if the need

> listio support

I really don't understand the kernel interface for this feature.

> Details:
> -------
> An IOCB_CMD_GROUP is added to the IOCB_CMD enum in include/linux/aio_abi.h
> A struct lio_event is added in include/linux/aio.h
> A struct lio_event *ki_lio is added to struct iocb in include/linux/aio.h

So you have a pointer in the structure for the individual requests. I
assume you use the atomic counter to trigger the final delivery. I
further assume that if lio_wait is set the calling thread is suspended
until all requests are handled and that the final notification in this
case means that thread gets woken.

This is all fine.

But how do you pass the requests to the kernel? If you have a new
lio_listio-like syscall it'll be easy. But I haven't seen anything like
this mentioned.

The alternative is to pass the requests one-by-one in which case I don't
see how you create the reference to the lio_listio control block. This
approach seems to be slower.

If all requests are passed at once, do you have the equivalent of
LIO_NOP entries?

How can we support the extension where we wait for a number of requests
which need not be all of them. I.e., I submit N requests and want to be
notified when at least M (M <= N) notified. I am not yet clear about
the actual semantics we should implement (e.g., do we send another
notification after the first one?) but it's something which IMO should
be taken into account in the design.

Finally, and this is very important, does you code send out the
individual requests notification and then in the end the lio_listio
completion? I think Suparna wrote this is the case but I want to make sure.

Overall, this looks much better than the old code. If the answers to my
questions show that the behavior is compatible with the POSIX AIO code
I'm certainly very much in favor of adding the kernel code.

â Ulrich Drepper â Red Hat, Inc. â 444 Castro St â Mountain View, CA â

Attachment: signature.asc
Description: OpenPGP digital signature