Re: [PATCH 4/9] soc: apple: Add SART driver

From: Sven Peter
Date: Tue Apr 05 2022 - 18:41:37 EST


On Mon, Apr 4, 2022, at 16:58, Rob Herring wrote:
> On Sat, Apr 02, 2022 at 09:07:17PM +0200, Arnd Bergmann wrote:
>> On Sat, Apr 2, 2022 at 2:38 PM Sven Peter <sven@xxxxxxxxxxxxx> wrote:
>> > On Mon, Mar 21, 2022, at 18:07, Arnd Bergmann wrote:
>> > > On Mon, Mar 21, 2022 at 5:50 PM Sven Peter <sven@xxxxxxxxxxxxx> wrote:
>> > >> The NVMe co-processor on the Apple M1 uses a DMA address filter called
>> > >> SART for some DMA transactions. This adds a simple driver used to
>> > >> configure the memory regions from which DMA transactions are allowed.
>> > >>
>> > >> Co-developed-by: Hector Martin <marcan@xxxxxxxxx>
>> > >> Signed-off-by: Hector Martin <marcan@xxxxxxxxx>
>> > >> Signed-off-by: Sven Peter <sven@xxxxxxxxxxxxx>
>> > >
>> > > Can you add some explanation about why this uses a custom interface
>> > > instead of hooking into the dma_map_ops?
>> >
>> > Sure.
>> > In a perfect world this would just be an IOMMU implementation but since
>> > SART can't create any real IOVA space using pagetables it doesn't fit
>> > inside that subsytem.
>> >
>> > In a slightly less perfect world I could just implement dma_map_ops here
>> > but that won't work either because not all DMA buffers of the NVMe
>> > device have to go through SART and those allocations happen
>> > inside the same device and would use the same dma_map_ops.
>> >
>> > The NVMe controller has two separate DMA filters:
>> >
>> > - NVMMU, which must be set up for any command that uses PRPs and
>> > ensures that the DMA transactions only touch the pages listed
>> > inside the PRP structure. NVMMU itself is tightly coupled
>> > to the NVMe controller: The list of allowed pages is configured
>> > based on command's tag id and even commands that require no DMA
>> > transactions must be listed inside NVMMU before they are started.
>> > - SART, which must be set up for some shared memory buffers (e.g.
>> > log messages from the NVMe firmware) and for some NVMe debug
>> > commands that don't use PRPs.
>> > SART is only loosely coupled to the NVMe controller and could
>> > also be used together with other devices. It's also the only
>> > thing that changed between M1 and M1 Pro/Max/Ultra and that's
>> > why I decided to separate it from the NVMe driver.
>> >
>> > I'll add this explanation to the commit message.
>>
>> Ok, thanks.
>>
>> > >> +static void sart2_get_entry(struct apple_sart *sart, int index, u8 *flags,
>> > >> + phys_addr_t *paddr, size_t *size)
>> > >> +{
>> > >> + u32 cfg = readl_relaxed(sart->regs + APPLE_SART2_CONFIG(index));
>> > >> + u32 paddr_ = readl_relaxed(sart->regs + APPLE_SART2_PADDR(index));
>> > >
>> > > Why do you use the _relaxed() accessors here and elsewhere in the driver?
>> >
>> > This device itself doesn't do any DMA transactions so it needs no memory
>> > synchronization barriers. Only the consumer (i.e. rtkit and nvme) read/write
>> > from/to these buffers (multiple times) and they have the required barriers
>> > in place whenever they are used.
>> >
>> > These buffers so far are only allocated at probe time though so even using
>> > the normal writel/readl here won't hurt performance at all. I can just use
>> > those if you prefer or alternatively add a comment why _relaxed is fine here.
>> >
>> > This is a bit similar to the discussion for the pinctrl series last year [1].
>>
>> I think it's better to only use the _relaxed version where it actually helps,
>> with a comment about it, and use the normal version elsewhere, in
>> particular in functions that you have copied from the normal nvme driver.
>> I had tried to compare some of your code with the other version and
>> was rather confused by that.
>
> Oh good, I tell folks the opposite (and others do too). We don't accept
> random explicit barriers without explanation, but implicit ones are
> okay? The resulting code on arm32 is also pretty horrible with the L2x0
> and OMAP sync hooks not that that matters here.
>
> I don't really care too much which way we go, but we should document one
> rule and follow that.

I don't have a strong opinion either. Arnd's approach is currently documented
in Documentation/driver-api/device-io.rst fwiw:

On architectures that require an expensive barrier for serializing against
DMA, these "relaxed" versions of the MMIO accessors only serialize against
each other, but contain a less expensive barrier operation. A device driver
might use these in a particularly performance sensitive fast path, with a
comment that explains why the usage in a specific location is safe without
the extra barriers.


Sven