Hi Matthias,
On Tue, 2016-06-07 at 18:59 +0200, Matthias Brugger wrote:
On 03/06/16 15:11, Matthias Brugger wrote:
[...]
+
+ smp_mb(); /* modify jump before enable thread */
+ }
+
+ cmdq_thread_writel(thread, task->pa_base +
task->command_size,
+ CMDQ_THR_END_ADDR);
+ cmdq_thread_resume(thread);
+ }
+ list_move_tail(&task->list_entry, &thread->task_busy_list);
+ spin_unlock_irqrestore(&cmdq->exec_lock, flags);
+}
+
+static void cmdq_handle_error_done(struct cmdq *cmdq,
+ struct cmdq_thread *thread, u32 irq_flag)
+{
+ struct cmdq_task *task, *tmp, *curr_task = NULL;
+ u32 curr_pa;
+ struct cmdq_cb_data cmdq_cb_data;
+ bool err;
+
+ if (irq_flag & CMDQ_THR_IRQ_ERROR)
+ err = true;
+ else if (irq_flag & CMDQ_THR_IRQ_DONE)
+ err = false;
+ else
+ return;
+
+ curr_pa = cmdq_thread_readl(thread, CMDQ_THR_CURR_ADDR);
+
+ list_for_each_entry_safe(task, tmp, &thread->task_busy_list,
+ list_entry) {
+ if (curr_pa >= task->pa_base &&
+ curr_pa < (task->pa_base + task->command_size))
What are you checking here? It seems as if you make some implcit
assumptions about pa_base and the order of execution of
commands in the
thread. Is it save to do so? Does dma_alloc_coherent give any
guarantees
about dma_handle?
1. Check what is the current running task in this GCE thread.
2. Yes.
3. Yes, CMDQ doesn't use iommu, so physical address is continuous.
Yes, physical addresses might be continous, but AFAIK there is no
guarantee that the dma_handle address is steadily growing, when
calling
dma_alloc_coherent. And if I understand the code correctly, you
use this
assumption to decide if the task picked from task_busy_list is
currently
executing. So I think this mecanism is not working.
I don't use dma_handle address, and just use physical addresses.
From CPU's point of view, tasks are linked by the busy list.
From GCE's point of view, tasks are linked by the JUMP command.
In which cases does the HW thread raise an interrupt.
In case of error. When does CMDQ_THR_IRQ_DONE get raised?
GCE will raise interrupt if any task is done or error.
However, GCE is fast, so CPU may get multiple done tasks
when it is running ISR.
In case of error, that GCE thread will pause and raise interrupt.
So, CPU may get multiple done tasks and one error task.
I think we should reimplement the ISR mechanism. Can't we just read
CURR_IRQ_STATUS and THR_IRQ_STATUS in the handler and leave
cmdq_handle_error_done to the thread_fn? You will need to pass
information from the handler to thread_fn, but that shouldn't be an
issue. AFAIK interrupts are disabled in the handler, so we should stay
there as short as possible. Traversing task_busy_list is expensive, so
we need to do it in a thread context.
Actually, our initial implementation is similar to your suggestion,
but display needs CMDQ to return callback function very precisely,
else display will drop frame.
For display, CMDQ interrupt will be raised every 16 ~ 17 ms,
and CMDQ needs to call callback function in ISR.
If we defer callback to workqueue, the time interval may be larger than
32 ms.sometimes.
I think the problem is, that you implemented the workqueue as a ordered
workqueue, so there is no parallel processing. I'm still not sure why
you need the workqueue to be ordered. Can you please explain.
The order should be kept.
Let me use mouse cursor as an example.
If task 1 means move mouse cursor to point A, task 2 means point B,
and task 3 means point C, our expected result is A -> B -> C.
If the order is not kept, the result could become A -> C -> B.
Got it, thanks for the clarification.
I think a way to get rid of the workqueue is to use a timer, which gets
programmed to the time a timeout in the first task in the busy list
would happen. Everytime we update the busy list (e.g. because of task
got finished by the thread), we update the timer. When the timer
triggers, which hopefully won't happen too often, we return timeout on
the busy list elements, until the time is lower then the actual time.
At least with this we can reduce the data structures in this driver and
make it more lightweight.
From my understanding, your proposed method can handle timeout case.
However, the workqueue is also in charge of releasing tasks.
Do you take releasing tasks into consideration by using the proposed
timer method?
Furthermore, I think the code will become more complex if we also use
timer to implement releasing tasks.