Re: [PATCH] sendfile removal

From: H. Peter Anvin
Date: Fri Jun 01 2007 - 12:59:16 EST


Linus Torvalds wrote:
> And the thing is, neither poll nor select work on regular files. And no,
> that is _not_ just an implementation issue. It's very fundamental: neither
> poll nor select get the file offset to wait for!
>
> And that file offset is _critical_ for a regular file, in a way it
> obviously is _not_ for a socket, pipe, or other special file. Because
> without knowing the file offset, you cannot know which page you should be
> waiting for!
>
> And no, the file offset is not "f_pos". sendfile(), along with
> pread/pwrite, uses a totally separate file offset, so if select/poll were
> to base their decision on f_pos, they'd be _wrong_.

This is obviously correct, although at the time those interfaces were
designed, I don't believe either pread/pwrite nor sendfile() existed,
and they still couldn't wait on real files. That there isn't a suitable
way to wait for a file at an offset is probably a result of that past
history.

Waiting at f_pos is still a possible interface, of course; it would mean
that pread/pwrite/sendfile users would have to seek before waiting.
However, implementing waiting on files in select/poll is prohibited by
POSIX, so it would at least need some sort of Linux-specific flag anyway.

It seems that being able to do nonblocking I/O on files would be a
useful thing. This really *does* require proper nonblocking I/O and not
just the ability to wait, since you can never know when the kernel
decides to recycle the page you are just about to want from the cache.

> So there's a few things to take away from this:
>
> - regular file access MUST NOT return EAGAIN just because a page isn't
> in the cache. Doing so is simply a bug. No ifs, buts or maybe's about
> it!
>
> Busy-looping is NOT ACCEPTABLE!
>
> - you *could* make some alternative conventions:
>
> (a) you could make O_NONBLOCK mean that you'll at least
> guarantee that you *start* the IO, and while you never return
> EAGAIN, you migth validly return a _partial_ result!
>
> (b) variation on (a): it's ok to return EAGAIN if _you_ were the
> one who started the IO during this particular time aroudn the
> loop. But if you find a page that isn't up-to-date yet, and
> you didn't start the IO, you *must* wait for it, so that you
> end up returning EAGAIN atmost once! Exactly because
> busy-looping is simply not acceptable behaviour!

(b) seems really ugly. (a) is at least well-defined. Either seems
wrong, though.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/