I can reproduce the stall using the code posted
by Matthias Moeller on 2.2.2-ac5, compiled with
egcs 1.1.1 running on uniprocessor PPro 166 MHz.
I was not able to reproduce it on stock 2.2.1,
but I did not try very extensively.
I have done some research on the WRITE_SIZE,
READ_SIZE and number of loops (3200 in
the original code). It does not seem to
be a race - there is a clear pattern
that seems to be deterministic. Fiddling
with tcp_sack does not change anything.
Perhaps somebody familiar with the used
algorhitms can interpret the pattern. I am
willing to test any experimental patches.
1) Server: WRITE_SIZE 1585, LOOPS 3200
Client: varying READ_SIZE
The stall occurs when READ_SIZE <= 956
2) Client: READ_SIZE 956
Server: WRITE_SIZE 1585, varying LOOPS
The stall occurs when LOOPS >= 45
3) Client: READ_SIZE 956
Server: LOOPS 45, varying WRITE_SIZE
The stall occurs, if
WRITE_SIZE >= 1480 && <= 1505
or
WRITE_SIZE >= 1554 (+74) && <= 1585 (+80)
or
WRITE_SIZE >= 1636 (+82) && <= 1673 (+88)
or
WRITE_SIZE >= 1727 (+91) && <= 1771 (+98)
Some more numbers for stalling cases:
WRITE_SIZE LOOPS READ_SIZE bytes read Send-Q
succesfully
--- 1480 50 956 29600 39960 1481 50 956 29620 (+20) 39987 (+27) 1482 50 956 29640 40014 ... 1488 50 956 29780 40203 1490 50 956 29800 38740 1491 50 956 29820 38766 (+26) 1492 50 956 29840 38792 (+26) ... 1505 50 956 30100 39130--- 1554 50 956 29526 38850 1555 50 956 29545 (+19) 38875 (+25) 1556 50 956 29564 38900 ... 1585 50 956 30115 39625--- 1636 50 956 29448 39264 1637 50 956 29466 (+18) 39288 (+24) ... 1673 50 956 30114 40152--- 1727 50 956 29359 39721 1728 50 956 29376 (+17) 39744 (+23) ... 1771 50 956 30107 38962
Regards
-- Stano
- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/