Re: [PATCHv4 2/2] arm64/io: Add a header for mmio access instrumentation

From: Sai Prakash Ranjan
Date: Mon Nov 22 2021 - 08:35:50 EST

On 11/19/2021 9:36 AM, Sai Prakash Ranjan wrote:
Hi Arnd,

On 11/18/2021 8:54 PM, Arnd Bergmann wrote:
On Mon, Nov 15, 2021 at 12:33 PM Sai Prakash Ranjan
<quic_saipraka@xxxxxxxxxxx> wrote:
   * Generic IO read/write.  These perform native-endian accesses.
-#define __raw_writeb __raw_writeb
-static inline void __raw_writeb(u8 val, volatile void __iomem *addr)
+static inline void arch_raw_writeb(u8 val, volatile void __iomem *addr)
         asm volatile("strb %w0, [%1]" : : "rZ" (val), "r" (addr));
Woundn't removing the #define here will break the logic in
making it fall back to the pointer-dereference version for the actual access?

#defines for these are added in mmio-instrumented.h header which is included in
arm64/asm/io.h, so it won't break the logic by falling back to pointer-dereference.

+void log_write_mmio(const char *width, volatile void __iomem *addr);
+void log_read_mmio(const char *width, const volatile void __iomem *addr);
+#define __raw_write(v, a, _l) ({                              \
+       volatile void __iomem *_a = (a);                        \
+       if (tracepoint_enabled(rwmmio_write))                   \
+               log_write_mmio(__stringify(write##_l), _a);     \
+       arch_raw_write##_l((v), _a);                            \
+       })
This feels like it's getting too big to be inlined. Have you considered
integrating this with the lib/logic_iomem.c infrastructure instead?

That already provides a way to override MMIO areas, and it lets you do
the logging from a single place rather than having it duplicated in every
single caller. It also provides a way of filtering it based on the ioremap()

Thanks for the suggestion, will look at the logic_iomem.c and see if it fits our

So I looked at logic_iomem.c which seems to be useful for emulated IO for virtio drivers
but our usecase just needs to log the mmio operations and no additional stuff, similar to
the logging access of x86 msr registers via tracepoint (arch/x86/include/asm/msr-trace.h).
Also raw read/write macros in logic_iomem.c have the callbacks which seems to be pretty costly
than inlining or direct function call given it has to be called for every register read and write
which are going to be thousands in our case. In their usecase, read and write callbacks are just
pci cfgspace reads and writes which may not be that frequently called and the latency might not
be visible but in our case, I think it would be visible if we have a callback as such. I know this is a
debug feature and perf isn't expected much but that wouldn't mean we should not have a debug
feature which performs better right.

On the second point, filtering by ioremap isn't much useful for our usecase since ioremapped
region can have 100s of registers and we are interested in the exact register read/write which
would cause any of the issues mentioned in the description of this patchset.

So I feel like the current way where we consolidate the instrumentation in mmio-instrumented.h
seems like the better way than adding tracing to an emulated iomem library.