Re: [PATCH] PCI: rcar-gen4: Limit Max_Read_Request_Size and Max_Payload_Size to 256 Bytes

From: Koichiro Den

Date: Mon May 11 2026 - 10:23:02 EST

On Mon, May 04, 2026 at 01:54:01AM +0200, Marek Vasut wrote:
> On 4/28/26 9:00 AM, Koichiro Den wrote:
>
> Hello Den-san,
>
> > The patch makes sense to me. Let me ask two questions:
> >
> > 1. Could r8a779f0 (R-Car S4-8) be handled as well, perhaps by adding a separate
> > .additional_common_init() implementation for it?
> >
> > As far as I can see, the r8a779f0 match data currently does not use
> > rcar_gen4_pcie_additional_common_init().

Hi Marek,

Thank you for the detailed analysis. I'm sorry for the late reply.

>
> I will address this one in V2, thank you for pointing that out.

Thanks!

>
> > 2. Did you also happen to test V4H/V4M in endpoint (EP) mode, with the local
> > eDMA engine issuing MRd requests toward host memory?
>
> I was not able to test this configuration.
>
> Is it possible to perform this test with a single device, by having the eDMA
> do local-memory-read-to-local-memory-write transfers, maybe using
> PIPE_LOOPBACK/LOOPBACK_ENABLE bits, or do I need two devices with NTB
> connection between them ?
>
> In case it is the later, could you please briefly describe the S4 NTB setup
> you use, so I could try to replicate it locally ?

My setup was a two-board setup:

S4 Spider as RC <-> S4 Spider as EP, connected with OCuLink.

It is unfortunately not a small standalone reproducer. The setup was based on
the following RFC v4 series:

[RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA
https://lore.kernel.org/all/20260118135440.1958279-1-den@xxxxxxxxxxxxx/

In particular, the workaround patch I used in the RFC series was:

[RFC PATCH v4 31/38] NTB: epf: Add per-SoC quirk to cap MRRS for DWC eDMA (128B for R-Car)
https://lore.kernel.org/all/20260118135440.1958279-32-den@xxxxxxxxxxxxx/

Note that in that workaround I only capped MRRS (i.e. I did not add an MPS cap).
At least in that setup, avoiding 256B MRd requests was enough to make the
visible corruption disappear.

At a high level, the EP side exposes the vNTB endpoint function, and the RC side
uses the NTB data path which is backed by the EP-local eDMA through that vNTB
function. For the RC-to-EP data path, the EP-local eDMA acts as the requester:
it issues MRd requests toward remote RC memory, receives the CplD payloads, and
writes the data into EP-side memory. In other words, this is a DMA read transfer
from the point of view of the EP-local eDMA.

I have not tried PIPE_LOOPBACK/LOOPBACK_ENABLE. Given how heavy the setup
described above is, I am not asking you to reproduce the whole thing just for
this patch. Also, I do not want this NTB/eDMA observation to block your v2. For
now, please treat it as a separate observation from the RC/NVMe issue. I will
continue the investigation on my side and let you know if I can narrow down
where the corruption occurs.

>
> > Your commit message
> > describes an NVMe device as the requester, but I'm wondering whether the same
> > 256B limit was also verified for the R-Car EP DMA requester path.
>
> This part I currently can not answer, I'm sorry.
>
> ...
>
> I made the following two observations in the meantime.
>
> First, I wrote two SSDs, Crucial P5 Plus SSD without HMPRE (without host
> memory buffer) and XPG GAMMIX P55 with HMPRE (with host memory buffer) with
> 4 GiB of random data on another system (iMX8M Plus, ARM64 with DWC PCIe
> controller too), then I did a read back and compared the data, the writen
> and read-back data matched.
>
> Then I plugged both SSDs into V4H Sparrow Hawk _without_ this patch, and I
> did read back of data:
>
> - Crucial P5 Plus SSD without HMPRE (without host memory buffer)
> -> Data read back match data written on iMX8M Plus, OK
> - XPG GAMMIX P55 with HMPRE (with host memory buffer)
> -> Data read back match data written on iMX8M Plus, OK
>
> Then I wrote 512 Byte of data into the Crucial P5 Plus SSD without HMPRE on
> V4H Sparrow Hawk and did read back again.
> -> Data read back does NOT match data written, NG
>
> That would indicate that:
> - WRITE transfers from SSD to DRAM are OK
> - READ transfers from DRAM to SSD are corrupted at 256 Bytes boundary
>
> That would indicate that we need _at_least_ the 256 Bytes limit, likely on
> both MPS and MRRS.
>
> Second, I got a report of another SSD for which this patch is not
> sufficient. I currently do not have access to that SSD, but I will ask for
> access and investigate. That may shed some light on the 128 Byte limit
> below.

Thank you for sharing these observations.
Interesting, that second point may indeed help determine whether my 128B
observation in the past is related to the same underlying issue, or is a purely
eDMA/NTB-specific one.

>
> > (*) The background for my question 2:
> >
> > I only have access to S4 Spider boards. In my RC <-> EP setup, where the EP
> > side uses the local eDMA engine to issue MRd requests toward the RC, 256-byte
> > MRd requests still appear to corrupt the transferred data.
>
> Is the corruption deterministic in some way, i.e. are the same bytes of the
> transferred data corrupted every time, or is the corruption "random" ?

The exact corrupted values were not deterministic, but the offsets where the
corruption occurred were fairly consistent.

Let me quote from my earlier RFC patch:
(https://lore.kernel.org/all/20260118135440.1958279-32-den@xxxxxxxxxxxxx/)

[...]
* On some R-Car platforms using the Synopsys DWC PCIe + eDMA we
* observe data corruption on RC->EP Remote DMA Read paths whenever
* the EP issues large MRd requests. The corruption consistently
* hits the tail of each 256-byte segment (e.g. offsets
* 0x00E0..0x00FF within a 256B block, and again at 0x01E0..0x01FF
* for larger transfers).
[...]

>
> Does the corruption happen even on singular MRd transfer, or does it happen
> only when a lot of traffic is sent across the NTB link? I wonder if this
> corruption might be DRAM bandwidth related, i.e. whether the DMA does
> possibly saturate the DRAM controller with write requests and make the
> system run out of DRAM bandwidth.

It occurred even with a single eDMA read transfer. It was not a symptom only
observable under high load.

>
> > With the following
> > change on top of your patch, my DMA-read tests become stable:
>
> [...]
>
> > One detail which might be important is that limiting only MPS does not appear
> > to be sufficient in my setup. MPS=128B with MRRS=256B still seems broken,
> > while MPS=128B with MRRS=128B works fine. I wonder whether this is because
> > the "MPS" term in the min(MRRS, MPS) limit for DMA read transfers may
> > effectively be tied to the DMA read buffer segment size / MPSS rather than
> > only to DevCtl.MPS. I'm not sure about this yet though.
>
> I think setting MPS=128B MRRS=256B only leads to the transfer being split
> into 2 x 128B TLPs sent across the PCIe link, but in the end, 2 x 128 Bytes
> of data are received (in some order) into the read segment buffer and
> reordered, and 1 x 256 Bytes are written from read segment buffer into the
> memory as a single write.
>
> In case of MPS=256B MRRS=256B, only one 256B TLP is sent across the link, 1
> x 256 Bytes of data are received into the read segment buffer with no
> reordering necessary, and 1 x 256 Bytes are still written from read segment
> buffer into the memory as a single write.
>
> => For MPS=128B/MPS=256B and MRRS=256B, there is difference in the
> transfer format between PCIe and DMA, but there is no difference
> between DMA and DRAM .
>
> But in case of MRRS=128B and transfer of 256 Bytes, 2 x 128 Bytes of data
> are received into (separate? (*)) entries in read segment buffer, and 2 x
> 128 Bytes are written from (separate?) entries in read segment buffer into
> the memory as two separate writes . Could this different memory write
> pattern be responsible for the (lack of) corruption ?
>
> Do you know whether the data are corrupted on the PCIe-to-DMA side (when the
> data are received from the PCIe side and written into the read buffer
> segment) or on the DMA-to-DRAM side (on read from read segment buffer or on
> write into DRAM) ?

Unfortunately I cannot distinguish these from software alone. I only observed
the final destination buffer contents after the eDMA read transfer completed.

>
> (*) Since the read segment buffer has 16 x 256 Byte segments, with 16 DMA
> tags and never more than 16 MRd requests in flight, I think it is likely
> that each MRd data land in separate read segment buffer segment. But this
> information comes from another datasheet, not V4H one.
>
> > One more thing I noticed in the manuals:
> >
> > R-Car S4 R19UH0161EJ0130 Rev.1.30 Jun. 16, 2025:
> > Type00 MPSS initial = 256B, PCI R, Internal R/W
> > Type01 MPSS initial = 128B, PCI R, Internal R
> >
> > R-Car V4H R19UH0186EJ0130 Rev.1.30 Apr. 21, 2025
> > Type00 MPSS initial = 256B, PCI R, Internal R
> > Type01 MPSS initial = 128B, PCI R, Internal R/W
> >
> > I'm still unsure, but this difference might be relevant. In particular, in
> > V4H/V4M RC mode your patch programs DevCtl.MPS to 256B, but does not change
> > Type01 MPSS. I wonder if the Type01 MPSS should also be updated to 256B first
> > on SoCs where the manual says it is writable from the internal bus, or if I'm
> > missing something here.
>
> This is a very good point.
>
> The R-Car S4 RM Rev.1.20 lists Type00 MPSS as Internal R and Type01 MPSS as
> Internal R/W. This was updated in RM Rev.1.30 to Type 00 Internal R/W and
> Type 01 Internal R. It is possible this change is going to be added into the
> V4H RM in the future too. That would likely imply, that Type01 MPSS is not
> programmable.
>
> I don't think Type1 affects RC operation, but does it affect NTB ?

I have no evidence that Type1 affects NTB either. It was just a speculative idea
based on the difference I saw in the manuals.

Your inference, i.e. that the S4 RM Rev.1.30 may reflect the intended access
attributes and the V4H RM may later get a similar correction, sounds reasonable
to me.

I had not checked the S4 RM Rev.1.20, so I missed that change. Thanks for
pointing it out.

>
> [...]
>
> Thank you for your help!

Thank you for investigating this and for the very helpful analysis.
I will let you know if I find anything more.

Best regards,
Koichiro

>
> --
> Best regards,
> Marek Vasut