Executive summary: Can I get the benefits of sendfile() for anonymous pages?
I have an application that generates hundreds of gigabytes of data per
hour. I want to push that data out over a TCP socket. (The network
connection will be fast; multiple bonded GigE lines or 10GigE.)
I gather that sendfile() is pretty efficient, so I would like to use
it. But I do not want to write all of my data to disk first. So I am
considering an approach like this:
int fd = shm_open("/foo", O_RDWR|O_TRUNC);
ftruncate(fd, length);
void *p = mmap (0, length, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
// (fill memory block at p with some data)
sendfile(fd, sock, 0, length);
Questions:
1) Will this work at all? (Some on-line sources suggest sendfile()
does not work with tmpfs files. But I think this was fixed at some
point...)
2) Will it provide zero-copy behavior, or does the fact that the pages
are mapped in my process cause sendfile() to copy them?
3) If it is zero-copy, what happens if I overwrite the memory block
after sendfile() returns? Do I risk corrupting my data? (In
particular, suppose I have TCP_CORK set on the socket. Will
sendfile() return before all of the data has actually been sent,
giving me a window to corrupt my data? If so, how do I know when it
is "safe" to re-use the memory?)
4) If sendfile() is not zero-copy in this example, would I expect a
performance boost anyway, because sendfile() does not need to crawl
page tables or something?
Any responses or references will be appreciated.