[adding Cavium folk and Timur]
On Wed, Mar 16, 2016 at 05:32:23PM +0800, Ganesh Mahendran wrote:
Reverts commit 97303480753e ("arm64: Increase the max granular size").
The commit 97303480753e ("arm64: Increase the max granular size") will
degrade system performente in some cpus.
We test wifi network throughput with iperf on Qualcomm msm8996 CPU:
----------------
run on host:
# iperf -s
run on device:
# iperf -c <device-ip-addr> -t 100 -i 1
----------------
Test result:
----------------
with commit 97303480753e ("arm64: Increase the max granular size"):
172MBits/sec
without commit 97303480753e ("arm64: Increase the max granular size"):
230MBits/sec
----------------
Some module like slab/net will use the L1_CACHE_SHIFT, so if we do not
set the parameter correctly, it may affect the system performance.
So revert the commit.
Unfortunately, the original patch is required to support the 128-byte L1
cache lines of Cavium ThunderX, so we can't simply revert it like this.
Similarly, the desire for a single, multiplatform kernel image prevents
us from reasonably fixing this at compile time to anything other than
the expected maximum value.
Furthermore, Timur previously said that the change is also required
"on our [Qualcomm] silicon", but I'm not sure if this is msm9886 or not:
http://lkml.kernel.org/r/CAOZdJXUiRMAguDV+HEJqPg57MyBNqEcTyaH+ya=U93NHb-pdJA@xxxxxxxxxxxxxx
You could look into making ARCH_DMA_MINALIGN a runtime value, but that
looks like an uphill struggle to me. Alternatively, we could only warn
if the CWG is bigger than L1_CACHE_BYTES *and* we have a non-coherent
DMA master, but that doesn't solve any performance issues from having
things like locks sharing cachelines, not that I think we ever got any
data on that (afaik, we don't pad locks to cacheline boundaries anyway).
I'm also not sure what it would mean for PCI NoSnoop transactions.