Re: [patch] pipe: add support for shrinking and growing pipes

From: Jens Axboe
Date: Sun May 23 2010 - 13:47:14 EST


On Sun, May 23 2010, Michael Kerrisk wrote:
> On Sun, May 23, 2010 at 9:09 AM, Jens Axboe <jens.axboe@xxxxxxxxxx> wrote:
> > On Sun, May 23 2010, Michael Kerrisk wrote:
> >> On Sun, May 23, 2010 at 4:38 AM, Andrew Morton
> >> <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> >> > On Sun, 23 May 2010 07:30:01 +0200 Michael Kerrisk <mtk.manpages@xxxxxxxxx> wrote:
> >> >
> >> >> Hi all,
> >> >>
> >> >> I see that this patch has hit Linus's git, so some questions
> >> >>
> >> >> On Wed, May 19, 2010 at 6:49 PM, Linus Torvalds
> >> >> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >> >> >
> >> >> >
> >> >> > On Wed, 19 May 2010, Miklos Szeredi wrote:
> >> >> >>
> >> >> >> One issue I see is that it's possible to grow pipes indefinitely.
> >> >> >> Should this be restricted to privileged users?
> >> >> >
> >> >> > Yes. But perhaps only if it grows past the default (or perhaps "default*2"
> >> >> > or similar). That way a normal user could shrink the pipe buffers, and
> >> >> > then grow them again if he wants to.
> >> >> >
> >> >> > Oh, and I think you need to also require that there be at least two
> >> >> > buffers. Otherwise we can't guarantee POSIX behavior, I think.
> >> >>
> >> >> Is there any documentation (e.g., a man-pages patch) for these changes?
> >> >>
> >> >> The argument of the fcntl() operations is expressed in pages. I take
> >> >> it that this means that the semantics of the argument will very
> >> >> depending on the system page size? So for example, 2 on x86 will mean
> >> >> 8192 bytes, but will mean 32768 of ia64? That seems very weird. (And
> >> >> what about architectures where the page size is switchable?) Such
> >> >> changes in semantics should not be silent for the use, IMO.
> >> >
> >> > Well, there is getpagesize().  But I agree - this interface is just
> >> > asking (x86) people to write non-portable code.
> >> >
> >> > otoh, if the arg was in bytes, they'd just hard-code "8192".  They're
> >> > clever like that.
> >> >
> >> > But we have gone to some lengths to avoid exposing things like
> >> > PAGE_SIZE and HZ in procfs, so it makes sense to take the same approach
> >> > to syscalls.
> >>
> >> Quite. All of the other memory-related APIs that I can think of
> >> require the user to express the info in bytes. (mlock(),
> >> remap_file_pages(), mmap(), mremap(), mprotect(), shmget(), and so
> >> on). Not doing the same for this interface is needlessly inconsistent.
> >> And while there will be the silly users you mention above, smart users
> >> will know how to do the right thing with a consistently designed
> >> interface.
> >
> > We can easily make F_GETPIPE_SZ return bytes, but I don't think passing
> > in bytes to F_SETPIPE_SZ makes a lot of sense. The pipe array must be a
> > power of 2 in pages. So the question is if that makes the API cleaner,
> > passing in number of pages but returning bytes? Or pass in bytes all
> > around, but have F_SETPIPE_SZ round to the nearest multiple of pow2 in
> > pages if need be. Then it would return a size at least what was passed
> > in, or error.
>
> I'd recommend this: Pass it in and out in bytes. Don't round to a
> power of 2. Require the user to know what they are doing. Give an
> error if the user doesn't supply a power-of-2 * page-size for
> F_SETPIPE_SZ. (Again, consider the case of architectures with
> switchable page sizes.)

But is there much point in erroring on an incorrect size? If the
application says "I need at least 120kb of space in there", kernel
returns "OK, you got 128kb". Would returning -1/EINVAL for that case
really make a better API? Doesn't seem like it to me.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/