On Sunday 01 November 2015 13:50:53 Sinan Kaya wrote:
The issue is not writel_relaxed vs. writel. After I issue reset, I need
wait for some time to confirm reset was done. I can use readl_polling
instead of mdelay if we don't like mdelay.
I meant that both _relaxed() and mdelay() are probably wrong here.
You are right about redundant writel_relaxed + wmb. They are effectively
equal to writel.
Actually, writel() is wmb()+writel_relaxed(), not the other way round:
Agreed.
When sending a command to a device that can start a DMA transfer,
the barrier is required to ensure that the DMA happens after previously
written data has gone from the CPU write buffers into the memory that
is used as the source for the transfer.
A barrier after the writel() has no effect, as MMIO writes are posted
on the bus.
However, after issuing the command; I still need to wait some amount of
time until hardware acknowledges the commands like reset/enable/disable.
These are relatively faster operations happening in microseconds. That's
why, I have mdelay there.
I'll take a look at workqueues but it could turn out to be an overkill
for few microseconds.
Most devices are able to provide an interrupt for long-running commands.
Are you sure that yours is unable to do this? If so, is this a design
mistake or an implementation bug?
I checked with the hardware designers. Hardware guarantees that by theReading the status probably requires a readl() rather than readl_relaxed()
to guarantee that the DMA data has arrived in memory by the time that the
register data is seen by the CPU. If using readl_relaxed() here is a valid
and required optimization, please add a comment to explain why it works
and how much you gain.
I will add some description. This is a high speed peripheral. I don't
like spreading barriers as candies inside the readl and writel unless I
have to.
According to the barriers video, I watched on youtube this should be the
rule for ordering.
"if you do two relaxed reads and check the results of the returned
variables, ARM architecture guarantees that these two relaxed variables
will get observed during the check."
this is called implied ordering or something of that sort.
My point was a bit different: while it is guaranteed that the
result of the readl_relaxed() is observed in order, they do not
guarantee that a DMA from device to memory that was started by
the device before the readl_relaxed() has arrived in memory
by the time that the readl_relaxed() result is visible to the
CPU and it starts accessing the memory.
time interrupt is observed, all data transactions in flight are
delivered to their respective places and are visible to the CPU. I'll
add a comment in the code about this.
I'm curious about this. Does that mean the device is not meant for
high-performance transfers and just synchronizes the bus before
triggering the interrupt?
I see.
In other words, when the hardware sends you data followed by anThere is HW guarantee for ordering.
interrupt to tell you the data is there, your interrupt handler
can tell the driver that is waiting for this data that the DMA
is complete while the data itself is still in flight, e.g. waiting
for an IOMMU to fetch page table entries.
On demand paging for IOMMU is only supported for PCIe via PRI (Page
Request Interface) not for HIDMA. All other hardware instances work on
pinned DMA addresses. I'll drop a note about this too to the code as well.
I wasn't talking about paging, just fetching the IOTLB from the
preloaded page tables in RAM. This can takes several uncached memory
accesses, so it would generally be slow.
Arnd