Re: [PATCH v2] openrisc: Add cacheinfo support

From: Sahil Siddiq
Date: Mon Mar 17 2025 - 14:36:45 EST

Next message: Ian Rogers: "Re: [PATCH v2] perf trace: Implement syscall summary in BPF"
Previous message: H. Peter Anvin: "Re: [PATCH v2 01/29] x86: treewide: Introduce x86_vendor_amd_or_hygon()"
In reply to: Stafford Horne: "Re: [PATCH v2] openrisc: Add cacheinfo support"
Next in thread: Stafford Horne: "Re: [PATCH v2] openrisc: Add cacheinfo support"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hello Stafford and Geert,

Thank you for the reviews.

On 3/17/25 1:55 PM, Geert Uytterhoeven wrote:

On Sun, 16 Mar 2025 at 07:59, Stafford Horne <shorne@xxxxxxxxx> wrote:

[...]

+struct cache_desc {
+ u32 size;
+ u32 sets;
+ u32 block_size;
+ u32 ways;

Considering the changes below to add cache available checks, maybe we
want to add a field here, such as `bool present`. Or a flags field like
is used in loongarch?

I assume cache_desc.size is zero when the cache is not present?

Yes, cache_desc.size will be zero when there's no cache component since
cpuinfo_or1k[NR_CPUS] is declared as a global variable in setup.c. The
cache attributes are stored in this struct.

On 3/17/25 5:30 PM, Stafford Horne wrote:

[...]
Yes, good point, would be clean too work too. I was not too happy with using
cache_desc.ways as is done below. Also there ended up bieng 2 different ways
that were used.

I am happy to use size too, but I think checking the SPR would be faster or just
as fast as using the struct. I am not too fussed either way.

There are a few places (e.g. cache_loop() in cache.c) where
cpuinfo_or1k[smp_processor_id()] is not being referenced. This will have to be
referenced to access cache_desc. In these cases, I think it would be better to
read from the SPR instead. For consistency, the SPR can also be read where
cpuinfo_or1k[smp_processor_id()] is already being referenced.

On 3/16/25 12:28 PM, Stafford Horne wrote:

On Sun, Mar 16, 2025 at 02:09:37AM +0530, Sahil Siddiq wrote:

Add cacheinfo support for OpenRISC.
[...]
diff --git a/arch/openrisc/kernel/cacheinfo.c b/arch/openrisc/kernel/cacheinfo.c
new file mode 100644
index 000000000000..6bb81e246f7e
--- /dev/null
+++ b/arch/openrisc/kernel/cacheinfo.c
@@ -0,0 +1,106 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * OpenRISC cacheinfo support
+ *
+ * Based on work done for MIPS and LoongArch. All original copyrights
+ * apply as per the original source declaration.
+ *
+ * OpenRISC implementation:
+ * Copyright (C) 2025 Sahil Siddiq <sahilcdq@xxxxxxxxx>
+ */
+
+#include <linux/cacheinfo.h>
+#include <asm/cpuinfo.h>
+#include <asm/spr.h>
+#include <asm/spr_defs.h>
+
+static inline void ci_leaf_init(struct cacheinfo *this_leaf, enum cache_type type,
+ unsigned int level, struct cache_desc *cache, int cpu)
+{
+ this_leaf->type = type;
+ this_leaf->level = level;
+ this_leaf->coherency_line_size = cache->block_size;
+ this_leaf->number_of_sets = cache->sets;
+ this_leaf->ways_of_associativity = cache->ways;
+ this_leaf->size = cache->size;
+ cpumask_set_cpu(cpu, &this_leaf->shared_cpu_map);
+}
+
+int init_cache_level(unsigned int cpu)
+{
+ struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()];
+ struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
+ int leaves = 0, levels = 0;
+ unsigned long upr = mfspr(SPR_UPR);
+ unsigned long iccfgr, dccfgr;
+
+ if (!(upr & SPR_UPR_UP)) {
+ printk(KERN_INFO
+ "-- no UPR register... unable to detect configuration\n");
+ return -ENOENT;
+ }
+
+ if (upr & SPR_UPR_DCP) {
+ dccfgr = mfspr(SPR_DCCFGR);
+ cpuinfo->dcache.ways = 1 << (dccfgr & SPR_DCCFGR_NCW);
+ cpuinfo->dcache.sets = 1 << ((dccfgr & SPR_DCCFGR_NCS) >> 3);
+ cpuinfo->dcache.block_size = 16 << ((dccfgr & SPR_DCCFGR_CBS) >> 7);
+ cpuinfo->dcache.size =
+ cpuinfo->dcache.sets * cpuinfo->dcache.ways * cpuinfo->dcache.block_size;
+ leaves += 1;
+ printk(KERN_INFO
+ "-- dcache: %d bytes total, %d bytes/line, %d set(s), %d way(s)\n",
+ cpuinfo->dcache.size, cpuinfo->dcache.block_size,
+ cpuinfo->dcache.sets,
+ cpuinfo->dcache.ways);

The indentation of sets looks a bit off here.

+ } else
+ printk(KERN_INFO "-- dcache disabled\n");
+
+ if (upr & SPR_UPR_ICP) {
+ iccfgr = mfspr(SPR_ICCFGR);
+ cpuinfo->icache.ways = 1 << (iccfgr & SPR_ICCFGR_NCW);
+ cpuinfo->icache.sets = 1 << ((iccfgr & SPR_ICCFGR_NCS) >> 3);
+ cpuinfo->icache.block_size = 16 << ((iccfgr & SPR_ICCFGR_CBS) >> 7);
+ cpuinfo->icache.size =
+ cpuinfo->icache.sets * cpuinfo->icache.ways * cpuinfo->icache.block_size;
+ leaves += 1;
+ printk(KERN_INFO
+ "-- icache: %d bytes total, %d bytes/line, %d set(s), %d way(s)\n",
+ cpuinfo->icache.size, cpuinfo->icache.block_size,
+ cpuinfo->icache.sets,
+ cpuinfo->icache.ways);

The indentation of sets looks a bit off here. Maybe its the others that are out
of line, but can you fix? Also I think sets and ways could be on the same line.

Sorry, I must have missed this. I'll fix it.

[...]
diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c
index b3edbb33b621..ffb161e41e9d 100644
--- a/arch/openrisc/kernel/dma.c
+++ b/arch/openrisc/kernel/dma.c
@@ -36,8 +36,10 @@ page_set_nocache(pte_t *pte, unsigned long addr,
flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
/* Flush page out of dcache */
- for (cl = __pa(addr); cl < __pa(next); cl += cpuinfo->dcache_block_size)
- mtspr(SPR_DCBFR, cl);
+ if (cpuinfo->dcache.ways) {
+ for (cl = __pa(addr); cl < __pa(next); cl += cpuinfo->dcache.block_size)
+ mtspr(SPR_DCBFR, cl);
+ }

I think it would be better to move this to cacheflush.h as a function like
flush_dcache_range() or local_dcache_range_flush().

return 0;
}
@@ -104,15 +106,19 @@ void arch_sync_dma_for_device(phys_addr_t addr, size_t size,
switch (dir) {
case DMA_TO_DEVICE:
/* Flush the dcache for the requested range */
- for (cl = addr; cl < addr + size;
- cl += cpuinfo->dcache_block_size)
- mtspr(SPR_DCBFR, cl);
+ if (cpuinfo->dcache.ways) {
+ for (cl = addr; cl < addr + size;
+ cl += cpuinfo->dcache.block_size)
+ mtspr(SPR_DCBFR, cl);
+ }

Also here,I think it would be better to move this to cacheflush.h as a function like
flush_dcache_range().

Or, local_dcache_range_flush(), which seems to be the convention we use in
cacheflush.h/cache.c.

break;
case DMA_FROM_DEVICE:
/* Invalidate the dcache for the requested range */
- for (cl = addr; cl < addr + size;
- cl += cpuinfo->dcache_block_size)
- mtspr(SPR_DCBIR, cl);
+ if (cpuinfo->dcache.ways) {
+ for (cl = addr; cl < addr + size;
+ cl += cpuinfo->dcache.block_size)
+ mtspr(SPR_DCBIR, cl);
+ }

This one could be invalidate_dcache_range(). Note, this will also be useful
for the kexec patches that I am working on.

Or, local_dcache_range_inv(), which seems to be the convention we use in
cacheflush.h/cache.c.

Understood.

[...]
@@ -29,13 +34,13 @@ static __always_inline void cache_loop(struct page *page, const unsigned int reg
void local_dcache_page_flush(struct page *page)
{
- cache_loop(page, SPR_DCBFR);
+ cache_loop(page, SPR_DCBFR, SPR_UPR_DCP);
}
EXPORT_SYMBOL(local_dcache_page_flush);
void local_icache_page_inv(struct page *page)
{
- cache_loop(page, SPR_ICBIR);
+ cache_loop(page, SPR_ICBIR, SPR_UPR_ICP);
}
EXPORT_SYMBOL(local_icache_page_inv);

OK, if we move the range flush and invalidate here we will need to add to this
cache_loop a bit more.

Right. I'll see what can be done to keep it concise.

diff --git a/arch/openrisc/mm/init.c b/arch/openrisc/mm/init.c
index d0cb1a0126f9..bbe16546c5b9 100644
--- a/arch/openrisc/mm/init.c
+++ b/arch/openrisc/mm/init.c
@@ -124,6 +124,7 @@ static void __init map_ram(void)
void __init paging_init(void)
{
int i;
+ unsigned long upr;
printk(KERN_INFO "Setting up paging and PTEs.\n");
@@ -176,8 +177,11 @@ void __init paging_init(void)
barrier();
/* Invalidate instruction caches after code modification */
- mtspr(SPR_ICBIR, 0x900);
- mtspr(SPR_ICBIR, 0xa00);
+ upr = mfspr(SPR_UPR);
+ if (upr & SPR_UPR_UP & SPR_UPR_ICP) {
+ mtspr(SPR_ICBIR, 0x900);
+ mtspr(SPR_ICBIR, 0xa00);
+ }

Here we could use new utilities such as local_icache_range_inv(0x900,
L1_CACHE_BYTES);

Or something like local_icache_block_inv(0x900). This only needs to flush a
single block as the code it is invalidating is just 2 instructions 8 bytes:

.org 0x900
l.j boot_dtlb_miss_handler
l.nop

.org 0xa00
l.j boot_itlb_miss_handler
l.nop

Given that there'll be generic local_(i|d)cache_range_inv(start, stop) utility
functions, would it make sense to simply have a macro defined as:

#define local_icache_block_inv(addr) local_icache_range_inv(start, L1_CACHE_BYTES)

instead of having a separate function for invalidating a single cache line? This would
still use cache_loop() under the hood. The alternative would be to use
local_icache_range_inv(start, L1_CACHE_BYTES) directly but using the macro might be
more readable.

Thanks,
Sahil

Next message: Ian Rogers: "Re: [PATCH v2] perf trace: Implement syscall summary in BPF"
Previous message: H. Peter Anvin: "Re: [PATCH v2 01/29] x86: treewide: Introduce x86_vendor_amd_or_hygon()"
In reply to: Stafford Horne: "Re: [PATCH v2] openrisc: Add cacheinfo support"
Next in thread: Stafford Horne: "Re: [PATCH v2] openrisc: Add cacheinfo support"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]