Re: [PoC] arm: dma-mapping: direct: Apply dma_pfn_offset only when it is valid

From: Peter Ujfalusi
Date: Wed Feb 05 2020 - 05:19:37 EST




On 03/02/2020 19.08, Christoph Hellwig wrote:
> On Fri, Jan 31, 2020 at 04:00:20PM +0200, Peter Ujfalusi wrote:
>> I see. My PoC patch was not too off then ;)
>> So the plan is to have a generic implementation for all of the
>> architecture, right?
>
> Ð don't know of a concrete plan, but that's defintively what I'd like
> to see.
>
>>>> The dma_pfn_offset is _still_ applied to the mask we are trying to set
>>>> (and validate) via dma-direct.
>>>
>>> And for the general case that is exactly the right thing to do, we
>>> just need to deal with really odd ZONE_DMA placements like yours.
>>
>> I'm still not convinced, the point of the DMA mask, at least how I see
>> it, to check that the dma address can be handled by the device (DMA,
>> peripheral with built in DMA, etc), it is not against physical address.
>> Doing phys_to_dma() on the mask from the dma_set_mask() is just wrong.
>
> We have a translation between the addresses that the device sees, and
> those that the CPU sees. The device can address N bits of address space
> as seen from the device. The addresses encoded in max_pfn,
> zone_dma_bits or the harcoded 32 in the zone dma 32 case are CPU address.
> So no, we can't blindly compare those.

Right, thanks for the explanation.

>>> But that will cause yet another regression in what we have just fixed
>>> with using the generic direct ops, at which points it turns into who
>>> screams louder.
>>
>> Hehe, I see.
>> I genuinely curious why k2 platform worked just fine with LPAE (it needs
>> it), but guys had issues with LPAE on dra7/am5.
>> The fix for dra7/am5 broke k2.
>> As far as I can see the main (only) difference is that k2 have
>> dma_pfn_offset = 0x780000, while dra7/am5 have it 0 (really direct mapping).
>
> How much memory does the platform have?

The boards which is bootable in mainline have maximum of 2G, there might
be custom boards with more RAM, but I'm not aware of them.

> Once you are above 32-bits worth
> of address space devices with a 32-bit DMA mask can't address all the
> memory. Now if k2 for example only had less than 4G of memory, but at
> addresses over 4G, and the offset compensates for the offset of the DRAM
> it works without bounce buffering and thus didn't need swiotlb. But any
> platform that has DRAM that is not addressable will need swiotlb.

I see, since we have maximum of 2G, which is mirrored at 0x80000000 for
devices we never needed the assistance from swiotlb for bounce buffering
and that's why the arm ops worked fine.

>
>>> u64 min_mask;
>>>
>>> + if (mask >= DMA_BIT_MASK(32))
>>> + return 1;
>>> +
>>
>> Right, so skipping phys_to_dma() for the mask and believing that it will
>> work..
>>
>> It does: audio and dmatest memcpy tests are just fine with this, MMC
>> also probed with ADMA enabled.
>>
>> As far as I can tell it works as well as falling back to the old arm ops
>> in case of LPAE && dma_pfn_offset != 0
>>
>> Fwiw:
>> Tested-by: Peter Ujfalusi <peter.ujfalusi@xxxxxx>
>>
>> Would you be comfortable to send this patch for mainline with
>> Fixes: ad3c7b18c5b3 ("arm: use swiotlb for bounce buffering on LPAE
>> configs")
>
> That is the big question. I don't feel overly comfortable as I've been
> trying to get this right, but so far it seems like the least bad option.
> I'll send out a proper patch with updated comments and will see what
> people think.

I understand and thank you for the patch, it makes k2 platform working
again!

- PÃter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki