Re: [PATCH] arm64: dts: allwinner: Add cache information to the SoC dtsi for H616

From: Chen-Yu Tsai
Date: Tue May 28 2024 - 12:09:08 EST


On Fri, May 3, 2024 at 5:09 PM Dragan Simic <dsimic@xxxxxxxxxxx> wrote:
>
> Add missing cache information to the Allwinner H616 SoC dtsi, to allow
> the userspace, which includes lscpu(1) that uses the virtual files provided
> by the kernel under the /sys/devices/system/cpu directory, to display the
> proper H616 cache information.
>
> Adding the cache information to the H616 SoC dtsi also makes the following
> warning message in the kernel log go away:
>
> cacheinfo: Unable to detect cache hierarchy for CPU 0
>
> Rather conspicuously, almost no cache-related information is available in
> the publicly available Allwinner H616 datasheet (version 1.0) and H616 user
> manual (version 1.0). Thus, the cache parameters for the H616 SoC dtsi were
> obtained and derived by hand from the cache size and layout specifications
> found in the following technical reference manual, and from the cache size
> and die revision hints available from the following community-provided data
> and memory subsystem benchmarks:
>
> - ARM Cortex-A53 revision r0p4 TRM, version J
> - Summary of the two available H616 die revisions and their differences
> in cache sizes observed from the CSSIDR_EL1 register readouts, provided
> by Andre Przywara [1][2]
> - Tinymembench benchmark results of the H616-based OrangePi Zero 2 SBC,
> provided by Thomas Kaiser [3]
>
> For future reference, here's a brief summary of the available documentation
> and the community-provided data and memory subsystem benchmarks:
>
> - All caches employ the 64-byte cache line length
> - Each Cortex-A53 core has 32 KB of L1 2-way, set-associative instruction
> cache and 32 KB of L1 4-way, set-associative data cache
> - The size of the L2 cache depends on the actual H616 die revision (there
> are two die revisions), so the entire SoC can have either 256 KB or 1 MB
> of unified L2 16-way, set-associative cache [1]
>
> Also for future reference, here's the relevant excerpt from the community-
> provided H616 memory subsystem benchmark, [3] which confirms that 32 KB and
> 256 KB are the L1 data and L2 cache sizes, respectively:
>
> block size : single random read / dual random read
> 1024 : 0.0 ns / 0.0 ns
> 2048 : 0.0 ns / 0.0 ns
> 4096 : 0.0 ns / 0.0 ns
> 8192 : 0.0 ns / 0.0 ns
> 16384 : 0.0 ns / 0.0 ns
> 32768 : 0.0 ns / 0.0 ns
> 65536 : 4.3 ns / 7.3 ns
> 131072 : 6.6 ns / 10.5 ns
> 262144 : 9.8 ns / 15.2 ns
> 524288 : 91.8 ns / 142.9 ns
> 1048576 : 138.6 ns / 188.3 ns
> 2097152 : 163.0 ns / 204.8 ns
> 4194304 : 178.8 ns / 213.5 ns
> 8388608 : 187.1 ns / 217.9 ns
> 16777216 : 192.2 ns / 220.9 ns
> 33554432 : 196.5 ns / 224.0 ns
> 67108864 : 215.7 ns / 259.5 ns
>
> The changes introduced to the H616 SoC dtsi by this patch specify 256 KB as
> the L2 cache size. As outlined by Andre Przywara, [2] a follow-up TF-A patch
> will perform runtime adjustment of the device tree data, making the correct
> L2 cache size of 1 MB present in the device tree for the boards based on the
> revision of H616 that actually provides 1 MB of L2 cache.
>
> [1] https://lore.kernel.org/linux-sunxi/20240430114627.0cfcd14a@xxxxxxxxxxxxxxxxxxxxxxxxxxx/
> [2] https://lore.kernel.org/linux-sunxi/20240501103059.10a8f7de@xxxxxxxxxxxxxxxxxxxxxxxxxxx/
> [3] https://raw.githubusercontent.com/ThomasKaiser/sbc-bench/master/results/4knM.txt
>
> Suggested-by: Andre Przywara <andre.przywara@xxxxxxx>
> Helped-by: Andre Przywara <andre.przywara@xxxxxxx>
> Signed-off-by: Dragan Simic <dsimic@xxxxxxxxxxx>
> ---
> .../arm64/boot/dts/allwinner/sun50i-h616.dtsi | 37 +++++++++++++++++++
> 1 file changed, 37 insertions(+)
>
> diff --git a/arch/arm64/boot/dts/allwinner/sun50i-h616.dtsi b/arch/arm64/boot/dts/allwinner/sun50i-h616.dtsi
> index b2e85e52d1a1..4faed88d8909 100644
> --- a/arch/arm64/boot/dts/allwinner/sun50i-h616.dtsi
> +++ b/arch/arm64/boot/dts/allwinner/sun50i-h616.dtsi
> @@ -26,30 +26,67 @@ cpu0: cpu@0 {
> reg = <0>;
> enable-method = "psci";
> clocks = <&ccu CLK_CPUX>;
> + i-cache-size = <0x8000>;
> + i-cache-line-size = <64>;
> + i-cache-sets = <256>;
> + d-cache-size = <0x8000>;
> + d-cache-line-size = <64>;
> + d-cache-sets = <128>;
> + next-level-cache = <&l2_cache>;

This no longer applies due to the CPU DVFS stuff getting merged.
Can you rebase and resend?


Thanks
ChenYu