Re: tcp_sendpage and page allocation lifetime vs. iscsi

From: David S. Miller
Date: Mon Apr 25 2005 - 17:19:55 EST


On Tue, 26 Apr 2005 00:06:03 +0200
Olivier Galibert <galibert@xxxxxxxxx> wrote:

> Do you think possible to extent the sendpage api to add some kind of
> "don't get the pages, copy them if you need them" flag?

No, not really.

Do you happen to run the scsi->done() function from iscsi
as soon as the write over the TCP socket completes returns
success? That is likely what is causing the problem.

When you call scsi->done(), the buffer is effectively released
and the scsi/st.c driver can legally reuse it once you've done
that.

tcp_sendpages() is really meant to be invoked for page cache
pages, or temporary pages cons'd up specifically for that
send call. Just look at what TCP sendmsg does, for example.
It carves up a per-socket cached PAGE to put the user's data
into.

You could do something similar in iSCSI and for now I highly
suggest that is what you do.

You could also:

1) set TCP_CORK to 1
2) tcp_sendmsg() the scsi tape data
3) tcp_sendpage() to remaining pages
4) set TCP_CORK to 0

so that tcp_sendmsg() does all the data copying for you.

Finally, you could also use "SIOCOUTQ" ioctl to watch the
write buffer get released. Call it once before you do the send,
save that value, then after your send wait for it to hit
or pass the old value you saved.

In short, you're using an API in a way it was never designed
to be used. We don't lock pages, and that is a deliberate
design decision. When we send pages over the wire using
TCP sendpages out of the page cache, the file contents _CAN_
change mid-send, but that's OK because the card calculates
the packet checksums so no data corruption nor quality of
implementation issues arise as a result.

Again, this behavior and these mechanics were deliberately
made to function this way.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/