sendfile() expert advice sought

From: Patrick J. LoPresti
Date: Tue Feb 16 2010 - 14:53:31 EST


Executive summary: Can I get the benefits of sendfile() for anonymous pages?

I have an application that generates hundreds of gigabytes of data per
hour. I want to push that data out over a TCP socket. (The network
connection will be fast; multiple bonded GigE lines or 10GigE.)

I gather that sendfile() is pretty efficient, so I would like to use
it. But I do not want to write all of my data to disk first. So I am
considering an approach like this:

int fd = shm_open("/foo", O_RDWR|O_TRUNC);
ftruncate(fd, length);
void *p = mmap (0, length, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
// (fill memory block at p with some data)
sendfile(fd, sock, 0, length);

Questions:

1) Will this work at all? (Some on-line sources suggest sendfile()
does not work with tmpfs files. But I think this was fixed at some
point...)

2) Will it provide zero-copy behavior, or does the fact that the pages
are mapped in my process cause sendfile() to copy them?

3) If it is zero-copy, what happens if I overwrite the memory block
after sendfile() returns? Do I risk corrupting my data? (In
particular, suppose I have TCP_CORK set on the socket. Will
sendfile() return before all of the data has actually been sent,
giving me a window to corrupt my data? If so, how do I know when it
is "safe" to re-use the memory?)

4) If sendfile() is not zero-copy in this example, would I expect a
performance boost anyway, because sendfile() does not need to crawl
page tables or something?

Any responses or references will be appreciated.

Thanks!

- Pat

P.S. I know I could also try mmap()'ing "/dev/zero" and using
vmsplice(). Same set of questions, though.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/