On Thu, Apr 9, 2015 at 5:36 PM, Stefano Stabellini
On Thu, 9 Apr 2015, Eric Dumazet wrote:
On Thu, 2015-04-09 at 16:46 +0100, Stefano Stabellini wrote:
I found a performance regression when running netperf -t TCP_MAERTS from
an external host to a Xen VM on ARM64: v3.19 and v4.0-rc4 running in the
virtual machine are 30% slower than v3.18.
Through bisection I found that the perf regression is caused by the
prensence of the following commit in the guest kernel:
Author: Eric Dumazet <edumazet@xxxxxxxxxx>
Date: Sun Dec 7 12:22:18 2014 -0800
tcp: refine TSO autosizing
This commit restored original TCP Small Queue behavior, which is the
first step to fight bufferbloat.
Some network drivers are known to be problematic because of a delayed TX
Try to tweak /proc/sys/net/ipv4/tcp_limit_output_bytes to see if it
makes a difference ?
A very big difference:
echo 262144 > /proc/sys/net/ipv4/tcp_limit_output_bytes
brings us much closer to the original performance, the slowdown is just
echo 1048576 > /proc/sys/net/ipv4/tcp_limit_output_bytes
fills the gap entirely, same performance as before "refine TSO
What would be the next step for here? Should I just document this as an
important performance tweaking step for Xen, or is there something else
we can do?
Is the problem perhaps that netback/netfront delays TX completion?
Would it be better to see if that can be addressed properly, so that
the original purpose of the patch (fighting bufferbloat) can be
achieved while not degrading performance for Xen? Or at least, so
that people get decent perfomance out of the box without having to
tweak TCP parameters?