Re: [PATCH] RISC-V: Remove IORESOURCE_BUSY flag for no-map reserved memory

From: Xianting Tian
Date: Wed May 11 2022 - 22:50:19 EST



在 2022/5/12 上午10:32, Nick Kossifidis 写道:
Hello Xianting,

---
 arch/riscv/kernel/setup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index 834eb652a7b9..71f2966b1474 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -214,7 +214,7 @@ static void __init init_resources(void)

         if (unlikely(memblock_is_nomap(region))) {
             res->name = "Reserved";
-            res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
+            res->flags = IORESOURCE_MEM;
         } else {
             res->name = "System RAM";
             res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;

The short story:

This makes sense but we should at least mark them as IORESOURCE_EXCLUSIVE, and also remove IORESOURCE_BUSY from line 192 above (https://elixir.bootlin.com/linux/v5.18-rc6/source/arch/riscv/kernel/setup.c#L192).
Thanks Nick for the detailed reply, I will sent V2 patch soon according to your suggestion.

The long story:

The spec says about no-map:

"
Indicates the operating system must not create a virtual mapping
of the region as part of its standard mapping of system memory,
nor permit speculative access to it under any circumstances other
than under the control of the device driver using the region.
"

It basically says that only the driver that uses the region should be able to create mappings for it and access it, and even that is not enough to prevent speculative access to the region by someone other than the driver. The thing is we can't implement this in a simple way because -to begin with- we don't have a way to pin those regions to specific devices/drivers, the memory-region binding doesn't say anything about that. When using devm_ioremap_resource() the first driver to claim the resource will mark it as busy so other drivers using the same api won't be able to use it, however the region can still be mapped in other ways (e.g. through ioremap directly) so using the resource tree to track/protect the regions marked with no-map isn't enough. They can even be accessed from userspace through /dev/mem unless we add them to the resource tree as IORESOURCE_MEM and enable/set CONFIG_IO_STRICT_DEVMEM/iomem=strict, but even then in case the corresponding ioresource isn't busy (e.g. hasn't been claimed by a driver yet through devm_ioremap_resource) we still have to mark them as IORESOURCE_EXCLUSIVE for iomem_is_exclusive() to do the trick (https://elixir.bootlin.com/linux/v5.18-rc6/source/kernel/resource.c#L1739) and prevent access through /dev/mem.

Finally the definition of no-map and the definition of MEMBLOCK_NOMAP are not the same, all MEMBLOCK_NOMAP says is "don't add to kernel direct mapping" so ioremap that uses vmalloc can still be used by everyone, in general it's a mess. It becomes worse, if you mark a reserved-memory region with no-map and that region overlaps with /memory, early_init_dt_reserve_memory_arch() will isolate it, mark it as MEMBLOCK_NOMAP and won't add it to memblock.reserved, if it doesn't overlap it will just ignore it and still not add it to memblock.reserved. So if we want to add a reserved-memory region that doesn't overlap with /memory (a valid scenario allowed by the binding), there is no way to mark it with no-map.

When I wrote that code I was looking for a way to prevent access to reserved regions through /dev/mem and by other drivers, regardless of being part of /memory or not, and since I couldn't mark them with no-map to track them because of early_init_dt_reserve_memory_arch(), I marked them as busy and then used them from the driver with ioremap directly. It was a temporary measure until I had a better approach (e.g. override ioremap / devmem_is_allowed like other archs do) but I never got the time to finish it, sorry for the mess !

Regards,
Nick