Re: [PATCH v2 1/8] mmc: sdhci: Get rid of finish_tasklet

From: Adrian Hunter
Date: Fri Mar 08 2019 - 08:37:41 EST

Next message: Arnaldo Carvalho de Melo: "Re: [PATCH v4 02/15] perf tools script: Support insn output for normal samples"
Previous message: Peter Zijlstra: "Re: [PATCH 1/5] x86/percpu: Differentiate this_cpu_{}() and __this_cpu_{}()"
Next in thread: Rizvi, Mohammad Faiz Abbas: "Re: [PATCH v2 1/8] mmc: sdhci: Get rid of finish_tasklet"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 6/03/19 12:00 PM, Faiz Abbas wrote:
> Adrian,
>
> On 25/02/19 1:47 PM, Adrian Hunter wrote:
>> On 15/02/19 9:20 PM, Faiz Abbas wrote:
>>> sdhci.c has two bottom halves implemented. A threaded_irq for handling
>>> card insert/remove operations and a tasklet for finishing mmc requests.
>>> With the addition of external dma support, dmaengine APIs need to
>>> terminate in non-atomic context before unmapping the dma buffers.
>>>
>>> To facilitate this, remove the finish_tasklet and move the call of
>>> sdhci_request_done() to the threaded_irq() callback.
>>
>> The irq thread has a higher latency than the tasklet. The performance drop
>> is measurable on the system I tried:
>>
>> Before:
>>
>> # dd if=/dev/mmcblk1 of=/dev/null bs=1G count=1 &
>> 1+0 records in
>> 1+0 records out
>> 1073741824 bytes (1.1 GB) copied, 4.44502 s, 242 MB/s
>>
>> After:
>>
>> # dd if=/dev/mmcblk1 of=/dev/null bs=1G count=1 &
>> 1+0 records in
>> 1+0 records out
>> 1073741824 bytes (1.1 GB) copied, 4.50898 s, 238 MB/s
>>
>> So we only want to resort to the thread for the error case.
>>
>
> Sorry for the late response here, but this is about 1.6% decrease. I
> tried out the same commands on a dra7xx board here (with about 5
> consecutive dd of 1GB) and the average decrease was 0.3%. I believe you
> will also find a lesser percentage change if you average over multiple
> dd commands.
>
> Is this really so significant that we have to maintain two different
> bottom halves and keep having difficulty with adding APIs that can sleep?

It is a performance drop that can be avoided, so it might as well be.
Splitting the success path from the failure path is common for I/O drivers
for similar reasons as here: the success path can be optimized whereas the
failure path potentially needs to sleep.

>
> Also I am not sure how to implement only the error handling part in the
> threaded_irq. We need to enter sdhci_request_done() and get the current
> mrq before we can check for error conditions like I've done in patch 2:
>
> /* Terminate and synchronize dma in case of an error */
> if (data && (mrq->cmd->error || data->error) &&
> host->use_external_dma) {
> struct dma_chan *chan = sdhci_external_dma_channel(host, data);
> dmaengine_terminate_sync(chan);
> }
>
> On a related note, do we really need to protect everything in
> sdhci_request_done() with spinlocks?
> In patch 2 I have only removed lock
> for the terminate_sync() parts that I added but the whole
> dma_unmap/dma_sync parts should be left unprotected IMO.

As it is written, synchronization is needed to stop the same mrq being
finished twice.

I suggest doing the dmaengine_terminate_sync() before calling
sdhci_request_done(). Perhaps you could record the channel that needs to be
sync'd and then do:

struct dma_chan *chan = READ_ONCE(host->sync_chan);

if (chan) {
dmaengine_terminate_sync(chan);
host->sync_chan = NULL;
}

Next message: Arnaldo Carvalho de Melo: "Re: [PATCH v4 02/15] perf tools script: Support insn output for normal samples"
Previous message: Peter Zijlstra: "Re: [PATCH 1/5] x86/percpu: Differentiate this_cpu_{}() and __this_cpu_{}()"
Next in thread: Rizvi, Mohammad Faiz Abbas: "Re: [PATCH v2 1/8] mmc: sdhci: Get rid of finish_tasklet"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]