Re: Data corruption issue with splice() on 2.6.27.10

From: Ben Mansell
Date: Tue Jan 06 2009 - 12:56:55 EST


Hi,

I'm facing a data corruption problem with splice() between two
non-blocking TCP sockets on 2.6.27.10. I could finally write a
simpler proof of concept, and capture a snapshot of the issue
with the associated strace result.

My program does the following :
- accept an incoming connection
- connect to a remote server
- forward all data from the server to the client using splice()

The data count is always correct, but some parts are corrupted and
contain data which seem to come from random memory locations (this
raises a security concern BTW). It *sometimes* happens that a few
megabytes can be transferred without any problem, but most of the
time, corruption happens for a few hundreds of bytes every few
hundreds of kilobytes.

FWIW, I can easily reproduce this on a Linux 2.6.27-9 (Ubuntu kernel), using both forcedeth and tg3 network drivers. It's reassuring to hear of this network corruption as I have been puzzling over non-blocking splice() code recently!

The corruption does seem to be confined to the user data on the connections, as I have been able to run some benchmarks using my own splice()-enabled HTTP proxy to transfer lots of data. All the TCP connections 'work' fine (as in no broken TCP), the initial HTTP request & response headers get through OK, but the body data of the responses sometimes gets corrupted. The benchmark seems to work flawlessly until you look at the web page data!

I'm happy to run any test code on systems here or provide any debug information if it would help to track this down.


Ben
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/