Re: [PATCH] dma-mapping: Relax warnings for per-device areas

From: Fredrik Noring
Date: Sat Jul 07 2018 - 02:32:55 EST


Hi JÃrgen, Robin,

> Don't forget that the SIF DMA packets are limited and the kernel will
> block/reschedule when it is out of SIF DMA packets. The allocation is
> implemented inside the SBIOS. You may easily get a deadlock or a livelock
> when you just let it run without thinking about the design. When you use
> the old CDVD driver on IOP, the RPC code inside SBIOS tries to simulate
> the interface like the new CDVD driver. The problem is that this is done
> by a busy loop waiting for a free SIF DMA packet. This would block the
> complete Linux kernel for an unknown time.
>
> As I understand you, you wanted to move the SBIOS code inside the Linux
> kernel. I am not sure whether you already have done it. When you do this,
> it is easier to fix the CDVD problem, but you need to think about booting
> using the official RTE disc from Sony for Linux, because it loads
> different modules and a different SBIOS. As this is the official way to
> start Linux on the PS2 which is supported by Sony, we should also support
> this in the official Linux kernel. Kernelloader can partially simulate it,
> but you need the files from the RTE disc.

The kernel no longer needs or uses the SBIOS, partly due to the issues
with having binary blobs of code that do kernel services. SBIOS memory is
reclaimed, so the SBIOS does not even exist when the kernel is running.

DMA is therefore only limited by the hardware design, which supports both
multiple simultaneously interconnected DMA controllers via memory or FIFOs,
and chained (scatter-gather) transfers.

Robin, does the kernel DMA subsystem support interconnected DMA controllers?
That involves arbitration of hardware FIFO resources (for example the SIF).

The Kernelloader boot program is not needed either, for any service, because
the IOP is reset and initialised by the kernel itself. Booting the kernel is
much faster and reliable without using the Kernelloader which frequently
crashes or refuses to load IOP modules.

The Kernelloader can still be used, if one wishes, but it's optional and not
a requirement.

> At least on some models I think you can desolder the RAM and replace it by
> a larger memory up to 4GB, because the 1394 Lead Vehicle Manual lists a
> feature:
>
> "Hardware generated response to received read or write requests in a
> designated 4GB address range without CPU involvment."
>
> The 1394 Lead Vehicle is not used in the PS2, but it is very similar to
> the IOP and it is the only manual we have about IOP. So I think the DMA
> mask for the device must be at least 32 Bits, because the device is able
> to access full 32 Bit. The EE where Linux is running may only be able to
> access a part of it directly. I think SIF DMA is always able to access it
> completely, as this is an official feature which is documented. The
> mapping at 0x1c000000-0x1c200000 seems just to be good luck, because it is
> not documented. As this is no official interface Sony is able to remove
> this mapping at any time in a new model. I don't know where the border of
> the mapping is, but in my experiements I have seen some hints that it can
> be different depending on which hardware or software is used. It looks
> like the more stuff is integrated into one single chip, the lower is the
> border, because I have seen strange behaviour and exceptions when
> accessing this memory on newer PS2 model. I limited the memory to 256KB
> for USB OHCI because of this strange behaviour on some models, but I
> wasn't able to figure out what was the real cause of the problem. I just
> recognized that it was stable with the 256KB limit.

Perhaps we need to invent a memory map zone within the IOP. I hope that we
can make full use of the DMA hardware, because DMA is by a wide margin the
most efficient kind of transfer.

> So the question is: What is the purpose of the DMA mask in Linux? Is it
> the area which can be accessed by the device? Or is it the area which can
> be accessed by the CPU? For the device it is 32 Bit. For the CPU it
> depends on the software and hardware and can be 0, because nothing may be
> shared with the CPU.

That's a good question. The DMA mapping updates that cause regressions need
some kind of mask, but I agree, it's unclear what that mask actually means.
Especially considering that the kernel cannot allocate normal memory for IOP
DMA anyway, so what is the purpose of the mask then?

> Even with DMA mask 0, the SIF DMA is still able to access the full 32 Bit.
> Each memory access by the OHCI driver can't be done directly and needs
> first to be transferred to IOP memory via SIF DMA before it can be
> accessed by OHCI DMA. This is what you called linked DMA transfer.

Right. So the IOP has DMA controllers that are also capable of simultaneously
interconnected (linked) transfers such as device<->memory<->SIF? That would
be very fortunate. A slight complication is that the SIF eventually needs
arbitration to support simultaneous transfers for multiple devices such as
the OHIC, ATA, iLink, Ethernet, etc.

Are you aware of any documentation describing the IOP DMA controllers?

> As far as I remember the USB sub-system and the OHCI driver was not
> written to handle memory which can't be written by the CPU at all. So I
> assume that you first need to allocate some temporary memory which is used
> to copy the data to or from the IOP DMA memory.

Exactly, that OHCI design is still used. Also important, we need to
investigate why OHCI interrupts sometimes are lost. Do you have any idea?

> Then I think you can increase the 256KB limit without getting an unstable
> system.

I would like to learn more about the source of this problem. I'm considering
making an IOP device driver, so that it can be examined more easily.

> I heard someone talking about problems in the SMMU which were fixed by
> increasing the DMA mask. This lets me believe that the DMA mask is
> something which is required by SMMU and therefore 32 Bit should be
> correct. But when a hardware designer tries to add an IOMMU to the PS2,
> there would be at least 3 different IOMMUs needed, because we have 3
> different buses (for EE, IOP and GS).

I'm not sure what you mean with SMMU, since this is not ARM hardware? Also,
the Graphics Synthesizer (GS) is a serial interface and not really as bus,
isn't it?

> The original approach for USB OHCI in Linux 2.4 was that IOP memory is
> handled as PCI memory. I.e. the driver was "thinking" that it allocates
> PCI memory, but indeed there was IOP memory allocated. All DMA ops where
> implemented in the PCI ops and so it was just working like a PCI card with
> device local memory.

Is PCI a good or valid model for the IOP?

Fredrik