Re: using DMA-API on ARM

From: Arend van Spriel
Date: Mon Dec 08 2014 - 11:22:53 EST


On 12/08/14 16:01, Arnd Bergmann wrote:
On Monday 08 December 2014 13:47:38 Hante Meuleman wrote:
Still using outlook, but will limit the line length, I hope that works for the
moment. Attached is a log with the requested information, it is a little
bit non-standard though. The dump code from the mm was copied in
the driver and called from there, mapping the prints back to our local
printf, but it should produce the same. I did this because I didn't realize
the table is static.

Some background on the test setup: I'm using a Broadcom reference
design AP platform with an BRCM 4708 host SOC.

I think you are using the wrong dtb file, the log says this is
a "Buffalo WZR-1750DHP", not the reference design.

That router is close enough to the reference design.

For the AP router
platform the opensource packet OpenWRT was used. Some small
modifications were made to get it to work on our HW. Only one core
is enabled for the moment (no time to figure out how to enable the
other one). Openwrt was configured to use kernel 3.18-rc2 and
the brcmfmac of the compat-wireless code was updated with our
latest code (minor patches, which have been submitted already).
The device used is 43602 pcie device. Some modifications to the build
system were made to enable PCIE. The test is to connect with a
client to the AP and run iperf (TCP). The test can run for many hours
without a problem, but sometimes fails very quickly.

The bcm4708 platform is maintained by Hauke Mehrtens, adding him to Cc.

Thanks. While going through the DTS files I intended to add him as well ;-)

In your log, I see this message:

[ 0.000000] PL310 OF: cache setting yield illegal associativity
[ 0.000000] PL310 OF: -1069781724 calculated, only 8 and 16 legal
[ 0.000000] L2C-310 enabling early BRESP for Cortex-A9
[ 0.000000] L2C-310 full line of zeros enabled for Cortex-A9
[ 0.000000] L2C-310 dynamic clock gating enabled, standby mode enabled
[ 0.000000] L2C-310 cache controller enabled, 16 ways, 256 kB
[ 0.000000] L2C-310: CACHE_ID 0x410000c8, AUX_CTRL 0x4e130001

Evidently the cache controller information in DT is incorrect and
the setup may be wrong as a consequence, which may explain cache
coherency problems.

While staring at the DTS files I suspect there are some parts still missing. I have attached them for reference. Catalin pointed us to a patch in the l2 cache [1]. We have not tried that yet.

Can you verify that the AUX_CTRL value is the same one you see
in a working kernel?

The log: first the ring allocation info is printed. Starting at
16.124847, ring 2, 3 and 4 are rings used for device to host. In this
log the failure is on a read of ring 3. Ring 3 is 1024 entries of each
16 bytes. The next thing printed is the kernel page tables. Then some
OpenWRT info and the logging of part of the connection setup. Then at
1780.130752 the logging of the failure starts. The sequence number is
modulo 253 with ring size of 1024 matches an "old" entry (read 40,
expected 52). Then the different pointers are printed followed by
the kernel page table. The code does then a cache invalidate on the
dma_handle and the next read the sequence number is correct.

How do you invalidate the cache? A dma_handle is of type dma_addr_t
and we don't define an operation for that, nor does it make sense
on an allocation from dma_alloc_coherent(). What happens if you
take out the invalidate?

dma_sync_single_for_cpu(, DMA_FROM_DEVICE) which ends up invalidating the cache (or that is our suspicion).

Can you post the patch that you use (both platform and driver) relative
to the snapshot of the the mainline kernel you are basing on?

Arnd


Regards,
Arend

[1] http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=6529/1

Attachment: bcm-dt-files.tar.bz2
Description: BZip2 compressed data