On 22/06/2022 13:46, Rajendra Nayak wrote:
On 6/1/2022 3:41 PM, Krzysztof Kozlowski wrote:
Add device node for CPU-memory BWMON device (bandwidth monitoring) on
SDM845 measuring bandwidth between CPU (gladiator_noc) and Last Level
Cache (memnoc). Usage of this BWMON allows to remove fixed bandwidth
votes from cpufreq (CPU nodes) thus achieve high memory throughput even
with lower CPU frequencies.
Performance impact (SDM845-MTP RB3 board, linux next-20220422):
1. No noticeable impact when running with schedutil or performance
governors.
2. When comparing to customized kernel with synced interconnects and
without bandwidth votes from CPU freq, the sysbench memory tests
show significant improvement with bwmon for blocksizes past the L3
cache. The results for such superficial comparison:
sysbench memory test, results in MB/s (higher is better)
bs kB | type | V | V+no bw votes | bwmon | benefit %
1 | W/seq | 14795 | 4816 | 4985 | 3.5%
64 | W/seq | 41987 | 10334 | 10433 | 1.0%
4096 | W/seq | 29768 | 8728 | 32007 | 266.7%
65536 | W/seq | 17711 | 4846 | 18399 | 279.6%
262144 | W/seq | 16112 | 4538 | 17429 | 284.1%
64 | R/seq | 61202 | 67092 | 66804 | -0.4%
4096 | R/seq | 23871 | 5458 | 24307 | 345.4%
65536 | R/seq | 18554 | 4240 | 18685 | 340.7%
262144 | R/seq | 17524 | 4207 | 17774 | 322.4%
64 | W/rnd | 2663 | 1098 | 1119 | 1.9%
65536 | W/rnd | 600 | 316 | 610 | 92.7%
64 | R/rnd | 4915 | 4784 | 4594 | -4.0%
65536 | R/rnd | 664 | 281 | 678 | 140.7%
Legend:
bs kB: block size in KB (small block size means only L1-3 caches are
used
type: R - read, W - write, seq - sequential, rnd - random
V: vanilla (next-20220422)
V + no bw votes: vanilla without bandwidth votes from CPU freq
bwmon: bwmon without bandwidth votes from CPU freq
benefit %: difference between vanilla without bandwidth votes and bwmon
(higher is better)
Co-developed-by: Thara Gopinath <thara.gopinath@xxxxxxxxxx>
Signed-off-by: Thara Gopinath <thara.gopinath@xxxxxxxxxx>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@xxxxxxxxxx>
---
arch/arm64/boot/dts/qcom/sdm845.dtsi | 54 ++++++++++++++++++++++++++++
1 file changed, 54 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
index 83e8b63f0910..adffb9c70566 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -2026,6 +2026,60 @@ llcc: system-cache-controller@1100000 {
interrupts = <GIC_SPI 582 IRQ_TYPE_LEVEL_HIGH>;
};
+ pmu@1436400 {
+ compatible = "qcom,sdm845-cpu-bwmon";
+ reg = <0 0x01436400 0 0x600>;
+
+ interrupts = <GIC_SPI 581 IRQ_TYPE_LEVEL_HIGH>;
+
+ interconnects = <&gladiator_noc MASTER_APPSS_PROC 3 &mem_noc SLAVE_EBI1 3>,
+ <&osm_l3 MASTER_OSM_L3_APPS &osm_l3 SLAVE_OSM_L3>;
+ interconnect-names = "ddr", "l3c";
Is this the pmu/bwmon instance between the cpu and caches or the one between the caches and DDR?
To my understanding this is the one between CPU and caches.
Depending on which one it is, shouldn;t we just be scaling either one and not both the interconnect paths?
The interconnects are the same as ones used for CPU nodes, therefore if
we want to scale both when scaling CPU, then we also want to scale both
when seeing traffic between CPU and cache.
Maybe the assumption here is not correct, so basically the two
interconnects in CPU nodes are also not proper?
Best regards,
Krzysztof