Re: [PATCH v8 2/3] CMDQ: Mediatek CMDQ driver

From: Horng-Shyang Liao
Date: Wed Jun 08 2016 - 08:25:42 EST


Hi Matthias,

On Wed, 2016-06-08 at 12:45 +0200, Matthias Brugger wrote:
>
> On 08/06/16 07:40, Horng-Shyang Liao wrote:
> > Hi Matthias,
> >
> > On Tue, 2016-06-07 at 18:59 +0200, Matthias Brugger wrote:
> >>
> >> On 03/06/16 15:11, Matthias Brugger wrote:
> >>>
> >>>
> >> [...]
> >>
> >>>>>>>>>>>> +
> >>>>>>>>>>>> + smp_mb(); /* modify jump before enable thread */
> >>>>>>>>>>>> + }
> >>>>>>>>>>>> +
> >>>>>>>>>>>> + cmdq_thread_writel(thread, task->pa_base +
> >>>>>>>>>>>> task->command_size,
> >>>>>>>>>>>> + CMDQ_THR_END_ADDR);
> >>>>>>>>>>>> + cmdq_thread_resume(thread);
> >>>>>>>>>>>> + }
> >>>>>>>>>>>> + list_move_tail(&task->list_entry, &thread->task_busy_list);
> >>>>>>>>>>>> + spin_unlock_irqrestore(&cmdq->exec_lock, flags);
> >>>>>>>>>>>> +}
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +static void cmdq_handle_error_done(struct cmdq *cmdq,
> >>>>>>>>>>>> + struct cmdq_thread *thread, u32 irq_flag)
> >>>>>>>>>>>> +{
> >>>>>>>>>>>> + struct cmdq_task *task, *tmp, *curr_task = NULL;
> >>>>>>>>>>>> + u32 curr_pa;
> >>>>>>>>>>>> + struct cmdq_cb_data cmdq_cb_data;
> >>>>>>>>>>>> + bool err;
> >>>>>>>>>>>> +
> >>>>>>>>>>>> + if (irq_flag & CMDQ_THR_IRQ_ERROR)
> >>>>>>>>>>>> + err = true;
> >>>>>>>>>>>> + else if (irq_flag & CMDQ_THR_IRQ_DONE)
> >>>>>>>>>>>> + err = false;
> >>>>>>>>>>>> + else
> >>>>>>>>>>>> + return;
> >>>>>>>>>>>> +
> >>>>>>>>>>>> + curr_pa = cmdq_thread_readl(thread, CMDQ_THR_CURR_ADDR);
> >>>>>>>>>>>> +
> >>>>>>>>>>>> + list_for_each_entry_safe(task, tmp, &thread->task_busy_list,
> >>>>>>>>>>>> + list_entry) {
> >>>>>>>>>>>> + if (curr_pa >= task->pa_base &&
> >>>>>>>>>>>> + curr_pa < (task->pa_base + task->command_size))
> >>>>>>>>>>>
> >>>>>>>>>>> What are you checking here? It seems as if you make some implcit
> >>>>>>>>>>> assumptions about pa_base and the order of execution of
> >>>>>>>>>>> commands in the
> >>>>>>>>>>> thread. Is it save to do so? Does dma_alloc_coherent give any
> >>>>>>>>>>> guarantees
> >>>>>>>>>>> about dma_handle?
> >>>>>>>>>>
> >>>>>>>>>> 1. Check what is the current running task in this GCE thread.
> >>>>>>>>>> 2. Yes.
> >>>>>>>>>> 3. Yes, CMDQ doesn't use iommu, so physical address is continuous.
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Yes, physical addresses might be continous, but AFAIK there is no
> >>>>>>>>> guarantee that the dma_handle address is steadily growing, when
> >>>>>>>>> calling
> >>>>>>>>> dma_alloc_coherent. And if I understand the code correctly, you
> >>>>>>>>> use this
> >>>>>>>>> assumption to decide if the task picked from task_busy_list is
> >>>>>>>>> currently
> >>>>>>>>> executing. So I think this mecanism is not working.
> >>>>>>>>
> >>>>>>>> I don't use dma_handle address, and just use physical addresses.
> >>>>>>>> From CPU's point of view, tasks are linked by the busy list.
> >>>>>>>> From GCE's point of view, tasks are linked by the JUMP command.
> >>>>>>>>
> >>>>>>>>> In which cases does the HW thread raise an interrupt.
> >>>>>>>>> In case of error. When does CMDQ_THR_IRQ_DONE get raised?
> >>>>>>>>
> >>>>>>>> GCE will raise interrupt if any task is done or error.
> >>>>>>>> However, GCE is fast, so CPU may get multiple done tasks
> >>>>>>>> when it is running ISR.
> >>>>>>>>
> >>>>>>>> In case of error, that GCE thread will pause and raise interrupt.
> >>>>>>>> So, CPU may get multiple done tasks and one error task.
> >>>>>>>>
> >>>>>>>
> >>>>>>> I think we should reimplement the ISR mechanism. Can't we just read
> >>>>>>> CURR_IRQ_STATUS and THR_IRQ_STATUS in the handler and leave
> >>>>>>> cmdq_handle_error_done to the thread_fn? You will need to pass
> >>>>>>> information from the handler to thread_fn, but that shouldn't be an
> >>>>>>> issue. AFAIK interrupts are disabled in the handler, so we should stay
> >>>>>>> there as short as possible. Traversing task_busy_list is expensive, so
> >>>>>>> we need to do it in a thread context.
> >>>>>>
> >>>>>> Actually, our initial implementation is similar to your suggestion,
> >>>>>> but display needs CMDQ to return callback function very precisely,
> >>>>>> else display will drop frame.
> >>>>>> For display, CMDQ interrupt will be raised every 16 ~ 17 ms,
> >>>>>> and CMDQ needs to call callback function in ISR.
> >>>>>> If we defer callback to workqueue, the time interval may be larger than
> >>>>>> 32 ms.sometimes.
> >>>>>>
> >>>>>
> >>>>> I think the problem is, that you implemented the workqueue as a ordered
> >>>>> workqueue, so there is no parallel processing. I'm still not sure why
> >>>>> you need the workqueue to be ordered. Can you please explain.
> >>>>
> >>>> The order should be kept.
> >>>> Let me use mouse cursor as an example.
> >>>> If task 1 means move mouse cursor to point A, task 2 means point B,
> >>>> and task 3 means point C, our expected result is A -> B -> C.
> >>>> If the order is not kept, the result could become A -> C -> B.
> >>>>
> >>>
> >>> Got it, thanks for the clarification.
> >>>
> >>
> >> I think a way to get rid of the workqueue is to use a timer, which gets
> >> programmed to the time a timeout in the first task in the busy list
> >> would happen. Everytime we update the busy list (e.g. because of task
> >> got finished by the thread), we update the timer. When the timer
> >> triggers, which hopefully won't happen too often, we return timeout on
> >> the busy list elements, until the time is lower then the actual time.
> >>
> >> At least with this we can reduce the data structures in this driver and
> >> make it more lightweight.
> >
> > From my understanding, your proposed method can handle timeout case.
> >
> > However, the workqueue is also in charge of releasing tasks.
> > Do you take releasing tasks into consideration by using the proposed
> > timer method?
> > Furthermore, I think the code will become more complex if we also use
> > timer to implement releasing tasks.
> >
>
> Can't we call
> clk_disable_unprepare(cmdq->clock);
> cmdq_task_release(task);
> after invoking the callback?

Do you mean just call these two functions in ISR?
My major concern is dma_free_coherent() and kfree() in
cmdq_task_release(task).
Therefore, your suggestion is to use GFP_ATOMIC for both
dma_alloc_coherent() and kzalloc(). Right?
If so, I can try to implement timeout by timer, and discuss with you
if I have further questions.

> Regrading the clock, wouldn't it be easier to handle the clock
> enable/disable depending on the state of task_busy_list? I suppose we
> can't as we would need to check the task_busy_list of all threads, right?
>
> Regards,
> Matthias

Thanks,
HS