Re: [PATCH 4/9] soc: apple: Add SART driver

From: Rob Herring
Date: Mon Apr 04 2022 - 10:58:49 EST


On Sat, Apr 02, 2022 at 09:07:17PM +0200, Arnd Bergmann wrote:
> On Sat, Apr 2, 2022 at 2:38 PM Sven Peter <sven@xxxxxxxxxxxxx> wrote:
> > On Mon, Mar 21, 2022, at 18:07, Arnd Bergmann wrote:
> > > On Mon, Mar 21, 2022 at 5:50 PM Sven Peter <sven@xxxxxxxxxxxxx> wrote:
> > >> The NVMe co-processor on the Apple M1 uses a DMA address filter called
> > >> SART for some DMA transactions. This adds a simple driver used to
> > >> configure the memory regions from which DMA transactions are allowed.
> > >>
> > >> Co-developed-by: Hector Martin <marcan@xxxxxxxxx>
> > >> Signed-off-by: Hector Martin <marcan@xxxxxxxxx>
> > >> Signed-off-by: Sven Peter <sven@xxxxxxxxxxxxx>
> > >
> > > Can you add some explanation about why this uses a custom interface
> > > instead of hooking into the dma_map_ops?
> >
> > Sure.
> > In a perfect world this would just be an IOMMU implementation but since
> > SART can't create any real IOVA space using pagetables it doesn't fit
> > inside that subsytem.
> >
> > In a slightly less perfect world I could just implement dma_map_ops here
> > but that won't work either because not all DMA buffers of the NVMe
> > device have to go through SART and those allocations happen
> > inside the same device and would use the same dma_map_ops.
> >
> > The NVMe controller has two separate DMA filters:
> >
> > - NVMMU, which must be set up for any command that uses PRPs and
> > ensures that the DMA transactions only touch the pages listed
> > inside the PRP structure. NVMMU itself is tightly coupled
> > to the NVMe controller: The list of allowed pages is configured
> > based on command's tag id and even commands that require no DMA
> > transactions must be listed inside NVMMU before they are started.
> > - SART, which must be set up for some shared memory buffers (e.g.
> > log messages from the NVMe firmware) and for some NVMe debug
> > commands that don't use PRPs.
> > SART is only loosely coupled to the NVMe controller and could
> > also be used together with other devices. It's also the only
> > thing that changed between M1 and M1 Pro/Max/Ultra and that's
> > why I decided to separate it from the NVMe driver.
> >
> > I'll add this explanation to the commit message.
>
> Ok, thanks.
>
> > >> +static void sart2_get_entry(struct apple_sart *sart, int index, u8 *flags,
> > >> + phys_addr_t *paddr, size_t *size)
> > >> +{
> > >> + u32 cfg = readl_relaxed(sart->regs + APPLE_SART2_CONFIG(index));
> > >> + u32 paddr_ = readl_relaxed(sart->regs + APPLE_SART2_PADDR(index));
> > >
> > > Why do you use the _relaxed() accessors here and elsewhere in the driver?
> >
> > This device itself doesn't do any DMA transactions so it needs no memory
> > synchronization barriers. Only the consumer (i.e. rtkit and nvme) read/write
> > from/to these buffers (multiple times) and they have the required barriers
> > in place whenever they are used.
> >
> > These buffers so far are only allocated at probe time though so even using
> > the normal writel/readl here won't hurt performance at all. I can just use
> > those if you prefer or alternatively add a comment why _relaxed is fine here.
> >
> > This is a bit similar to the discussion for the pinctrl series last year [1].
>
> I think it's better to only use the _relaxed version where it actually helps,
> with a comment about it, and use the normal version elsewhere, in
> particular in functions that you have copied from the normal nvme driver.
> I had tried to compare some of your code with the other version and
> was rather confused by that.

Oh good, I tell folks the opposite (and others do too). We don't accept
random explicit barriers without explanation, but implicit ones are
okay? The resulting code on arm32 is also pretty horrible with the L2x0
and OMAP sync hooks not that that matters here.

I don't really care too much which way we go, but we should document one
rule and follow that.

Rob