On Thu, Jul 22, 2021 at 11:28:01AM +0200, Christian König wrote:
Am 22.07.21 um 11:08 schrieb Daniel Vetter:Oh I know, but since dma_fence is meant to be a wait_queue with hw
[SNIP]Wait a second with that, this is not broken. It's just different behavior
As far as I know wake_up_state() tries to run the thread on the CPU it wasThe first versions had used wait_queue, but iirc we had some issues with
scheduled last, while wait_event_* makes the thread run on the CPU who
issues the wake by default.
And yes I've also noticed this already and it was one of the reason why I
suggested to use a wait_queue instead of the hand wired dma_fence_wait
implementation.
the callbacks and stuff and that was the reasons for hand-rolling. Or
maybe it was the integration of the lockless fastpath for
dma_fence_is_signalled().
[SNIP]If there's something broken we should just fix it, not force everyone to
Well it would have been nicer if we used the existing infrastructure instead
of re-inventing stuff for dma_fence, but that chance is long gone.
And you don't need a dma_fence_context base class, but rather just a flag in
the dma_fence_ops if you want to change the behavior.
set a random flag. dma_fence work like special wait_queues, so if we
differ then we should go back to that.
and there are good arguments for both sides.
support, they really should work the same and have the same tuning.
If a wait is short you can have situations where you want to start theI think the defaults are different because usually if you wake up a wait
thread on the original CPU.
This is because you can assume that the caches on that CPU are still hot
and heating up the caches on the local CPU would take longer than an inter
CPU interrupt.
But if the wait is long it makes more sense to run the thread on the CPU
where you noticed the wake up event.
This is because you can assume that the caches are cold anyway and
starting the thread on the current CPU (most likely from an interrupt
handler) gives you the absolutely best latency.
In other words you usually return from the interrupt handler and just
directly switch to the now running thread.
I'm not sure if all drivers want the same behavior. Rob here seems to prefer
number 2, but we have used 1 for dma_fence for a rather long time now and it
could be that some people start to complain when we switch unconditionally.
queue, there's a 1:1 relationship between waker and waiter.
Otoh if you just wake a thread it's probably some kinda of service thread,
so N:1 relationship between waker and waiter. And in that case moving the
waiter is a really bad idea.
I think dma_fence is generally much closer to 1:1 (with the most common
one irq handler -> scheduler thread for that engine), so having the "same
cpu" wake behaviour really sounds like the right thing to do. And not
anything that is specifically an issue with how qualcom gpus work, and
hence should be msm specific.
If it turns out to be the wrong thing, well I guess we'll learn
something. And then maybe we have a different version of dma_fence_wait.
-Daniel