Re: [PATCH][RFC] splice support
From: Linus Torvalds
Date: Wed Mar 29 2006 - 19:48:06 EST
On Wed, 29 Mar 2006, Andrew Morton wrote:
>
> - splice() take a size_t length. Should it be taking a 64-bit length?
No. You can't splice more than the kernel buffers anyway (ie currently
PIPE_BUFFERS pages, ie ~64kB, although in theory somebody could use large
pages for it), so 64-bit would be total overkill.
> - splice() doesn't check for (len < 0), like read() and write() do.
> Should it?
Umm. More likely better to just do rw_verify_area() instead, which limits
it to MAX_INT. Although it probably doesn't matter, for the above obvious
reason anyway (ie we end up doing everything on a page-granular area
anyway).
> - Please don't call it `len'. VFS has to deal with "lengths" which can
> be in units of PAGE_CACHE_SIZE, fs blocksize, 512-bytes sectors or bytes,
> and it gets confusing. Our liking for variable names like `len' and
> `count' just makes it worse.
I don't see that being problematic. I agree that if "len" is in pages, we
should call it something else (like 'npages'), but the fact that we
commonly use "len" for byte-lengths just sounds sane.
> - why is the `flags' arg to sys_splice() unsigned long? Can it be `int'?
flags are always unsigned long, haven't you noticed? Besides, they should
never be signed, if you do bitmasks and shifting on them: "int" is
strictly worse than "unsigned" when we're talking flags.
> - what does `flags' do, anyway? The whole thing is undocumented and
> almost uncommented.
Right now "flags" doesn't do anything at all, and you should just pass in
zero.
But if we ever do a "move" vs "copy" hint, we'll want something.
> - the tmp_page trick in anon_pipe_buf_release() seems to be unrelated to
> the splice() work. It should be a separate patch and any peformance
> testing (needed, please) should be decoupled from that change.
It's not unrelated. Note the new "page_count() == 1" test.
> - All the operations do foo(in, out, ...). It's a bit more conventional
> to do foo(out, in, ...).
You think? What convention do we have? We don't say "cp dst src", we say
"cp src dst".
The "destination first" convention is insane. It only makes sense for
assignments, and these aren't assignments.
> - The logic in do_splice() hurts my brain. "if `in' is a pipe then
> splice from `in-as-a-pipe' to `out' else if `out' is a pipe then splice
> from `in' to 'out-as-a-pipe'. Make sense, I guess, but I do wonder "what
> would happen if those tests were reversed?". Nothing, I guess.
Why would it matter? If both are pipes, then one is as good as the other.
You just want to pick the version that is potentially more efficient, if
there is any difference (and there is).
However, I don't think Jens actually did the pipe->pipe case at all (ie
pipes don't have the "splice_read()" function yet).
>
> - In pipe_to_file():
>
> - Shouldn't it be using GFP_HIGHUSER()?
Some filesystems may not like having highpages.
I suspect it should be "mapping_gfp_mask(mapping)".
> What does the feature do? How would one use it in an application? Is it
> intended that it be generalised to other kinds of address_spaces? If so,
> which ones, and what implementation problems might we expect?
You'd use it instead of "sendfile()".
Think of it as a hell of a lot more capable than "sendfile()" can ever be.
It can take input from anything that just has "splice_read()", and move it
to anything else, without doing any extra copies.
So imagine a streaming media thing, with a DVB card. Imaging wanting to
push all that data around the system without doing a "read()" into user
space and then a "write()" out to a file.
Also, imagine doing a "tee()" system call that can duplicate the pages
from one pipe into two other pipes. By just incrementing the page count.
So that your media streaming application can stream to both a file _and_
to live video (or whatever else) without doubling the buffering and doing
memcpy.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/