Re: [PATCH] arm64: dts: qcom: sc8280xp: fix PCIe DMA coherency

From: Manivannan Sadhasivam
Date: Fri Nov 25 2022 - 09:53:50 EST

Next message: Greg Kroah-Hartman: "Re: [PATCH 5.10 000/149] 5.10.156-rc1 review"
Previous message: Chen, Hu1: "Re: [PATCH bpf v2] selftests/bpf: Fix "missing ENDBR" BUG for destructor kfunc"
In reply to: Johan Hovold: "Re: [PATCH] arm64: dts: qcom: sc8280xp: fix PCIe DMA coherency"
Next in thread: Johan Hovold: "Re: [PATCH] arm64: dts: qcom: sc8280xp: fix PCIe DMA coherency"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Nov 25, 2022 at 03:43:59PM +0100, Johan Hovold wrote:
> On Fri, Nov 25, 2022 at 07:56:25PM +0530, Manivannan Sadhasivam wrote:
> > On Thu, Nov 24, 2022 at 03:25:01PM +0100, Johan Hovold wrote:
> > > The devices on the SC8280XP PCIe buses are cache coherent and must be
> > > marked as such to avoid data corruption.
> > >
> > > A coherent device can, for example, end up snooping stale data from the
> > > caches instead of using data written by the CPU through the
> > > non-cacheable mapping which is used for consistent DMA buffers for
> > > non-coherent devices.
> > >
> >
> > Also, the device may write into the L2 cache (or whatever cache that is
> > accessible) if there is an entry and the CPU may invalidate it before reading
> > from the DMA buffer. This will end up in a data loss.
>
> I mentioned the above as an example, but clearly it can affect also the
> other direction (e.g. as described below).
>
> > > Note that this is much more likely to happen since commit c44094eee32f
> > > ("arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()")
> > > that was added in 6.1 and which removed the cache invalidation when
> > > setting up the non-cacheable mapping.
> > >
> > > Marking the PCIe devices as coherent specifically fixes the intermittent
> > > NVMe probe failures observed on the Thinkpad X13s, which was due to
> > > corruption of the submission and completion queues. This was typically
> > > observed as corruption of the admin submission queue (with well-formed
> > > completion):
> > >
> > > could not locate request for tag 0x0
> > > nvme nvme0: invalid id 0 completed on queue 0
> > >
> > > or corruption of the admin or I/O completion queues (malformed
> > > completion):
> > >
> > > could not locate request for tag 0x45f
> > > nvme nvme0: invalid id 25695 completed on queue 25965
> > >
> > > presumably as these queues are small enough to not be allocated using
> > > CMA which in turn make them more likely to be cached (e.g. due to
> > > accesses to nearby pages through the cacheable linear map). Increasing
> > > the buffer sizes to two pages to force CMA allocation also appears to
> > > make the problem go away.
> > >
> >
> > I don't think the problem will go away if the allocation happens from CMA
> > region. It may just decrease the chances of cache hit but it could always
> > happen due to the existence of linear mapping with cacheable attribute.
>
> I never claimed it would fix the problem, I explicitly wrote that it
> made it less likely to occur (to the point where my reproducer no longer
> triggers).
>

> Increasing the buffer sizes to two pages to force CMA allocation also appears
> to make the problem go away.

The "go away" part sounded like a claim to me and hence I added the statement.
But no worries :)

Thanks,
Mani

> Johan

--
மணிவண்ணன் சதாசிவம்

Next message: Greg Kroah-Hartman: "Re: [PATCH 5.10 000/149] 5.10.156-rc1 review"
Previous message: Chen, Hu1: "Re: [PATCH bpf v2] selftests/bpf: Fix "missing ENDBR" BUG for destructor kfunc"
In reply to: Johan Hovold: "Re: [PATCH] arm64: dts: qcom: sc8280xp: fix PCIe DMA coherency"
Next in thread: Johan Hovold: "Re: [PATCH] arm64: dts: qcom: sc8280xp: fix PCIe DMA coherency"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]