Re: [PATCH v8 09/13] [media] vb2: add in-fence support to QBUF

From: Hans Verkuil
Date: Thu Apr 26 2018 - 02:27:24 EST


On 04/25/2018 10:11 PM, Ezequiel Garcia wrote:
> On 14 March 2018 at 12:55, Hans Verkuil <hverkuil@xxxxxxxxx> wrote:
>> On 03/09/2018 09:49 AM, Gustavo Padovan wrote:
>>> From: Gustavo Padovan <gustavo.padovan@xxxxxxxxxxxxx>
>>>
>>> Receive in-fence from userspace and add support for waiting on them
>>> before queueing the buffer to the driver. Buffers can't be queued to the
>>> driver before its fences signal. And a buffer can't be queue to the driver
>>
>> queue -> queued
>>
>>> out of the order they were queued from userspace. That means that even if
>>> it fence signal it must wait all other buffers, ahead of it in the queue,
>>
>> it fence signal -> its fence signals
>> wait all -> wait for all
>>
>>> to signal first.
>>>
>>> If the fence for some buffer fails we do not queue it to the driver,
>>> instead we mark it as error and wait until the previous buffer is done
>>> to notify userspace of the error. We wait here to deliver the buffers back
>>> to userspace in order.
>>>
>>> v9: - rename fence to in_fence in many places
>>> - handle fences signalling with error better (Hans Verkuil)
>>>
>>> v8: - improve comments and docs (Hans Verkuil)
>>> - fix unlocking of vb->fence_cb_lock on vb2_core_qbuf (Hans Verkuil)
>>> - move in-fences code that was in the out-fences patch here (Alex)
>>>
>>> v8: - improve comments about fences with errors
>>
>> v9? Two v8 entries?
>>
>>>
>>> v7:
>>> - get rid of the fence array stuff for ordering and just use
>>> get_num_buffers_ready() (Hans)
>>> - fix issue of queuing the buffer twice (Hans)
>>> - avoid the dma_fence_wait() in core_qbuf() (Alex)
>>> - merge preparation commit in
>>>
>>> v6:
>>> - With fences always keep the order userspace queues the buffers.
>>> - Protect in_fence manipulation with a lock (Brian Starkey)
>>> - check if fences have the same context before adding a fence array
>>> - Fix last_fence ref unbalance in __set_in_fence() (Brian Starkey)
>>> - Clean up fence if __set_in_fence() fails (Brian Starkey)
>>> - treat -EINVAL from dma_fence_add_callback() (Brian Starkey)
>>>
>>> v5: - use fence_array to keep buffers ordered in vb2 core when
>>> needed (Brian Starkey)
>>> - keep backward compat on the reserved2 field (Brian Starkey)
>>> - protect fence callback removal with lock (Brian Starkey)
>>>
>>> v4:
>>> - Add a comment about dma_fence_add_callback() not returning a
>>> error (Hans)
>>> - Call dma_fence_put(vb->in_fence) if fence signaled (Hans)
>>> - select SYNC_FILE under config VIDEOBUF2_CORE (Hans)
>>> - Move dma_fence_is_signaled() check to __enqueue_in_driver() (Hans)
>>> - Remove list_for_each_entry() in __vb2_core_qbuf() (Hans)
>>> - Remove if (vb->state != VB2_BUF_STATE_QUEUED) from
>>> vb2_start_streaming() (Hans)
>>> - set IN_FENCE flags on __fill_v4l2_buffer (Hans)
>>> - Queue buffers to the driver as soon as they are ready (Hans)
>>> - call fill_user_buffer() after queuing the buffer (Hans)
>>> - add err: label to clean up fence
>>> - add dma_fence_wait() before calling vb2_start_streaming()
>>>
>>> v3: - document fence parameter
>>> - remove ternary if at vb2_qbuf() return (Mauro)
>>> - do not change if conditions behaviour (Mauro)
>>>
>>> v2:
>>> - fix vb2_queue_or_prepare_buf() ret check
>>> - remove check for VB2_MEMORY_DMABUF only (Javier)
>>> - check num of ready buffers to start streaming
>>> - when queueing, start from the first ready buffer
>>> - handle queue cancel
>>>
>>> Signed-off-by: Gustavo Padovan <gustavo.padovan@xxxxxxxxxxxxx>
>>> ---
>>> drivers/media/common/videobuf2/videobuf2-core.c | 197 ++++++++++++++++++++----
>>> drivers/media/common/videobuf2/videobuf2-v4l2.c | 34 +++-
>>> drivers/media/v4l2-core/Kconfig | 33 ++++
>>> include/media/videobuf2-core.h | 14 +-
>>> 4 files changed, 248 insertions(+), 30 deletions(-)
>>>
>>> diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c
>>> index d3f7bb33a54d..5de5e35cfc40 100644
>>> --- a/drivers/media/common/videobuf2/videobuf2-core.c
>>> +++ b/drivers/media/common/videobuf2/videobuf2-core.c
>>> @@ -352,6 +352,7 @@ static int __vb2_queue_alloc(struct vb2_queue *q, enum vb2_memory memory,
>>> vb->index = q->num_buffers + buffer;
>>> vb->type = q->type;
>>> vb->memory = memory;
>>> + spin_lock_init(&vb->fence_cb_lock);
>>> for (plane = 0; plane < num_planes; ++plane) {
>>> vb->planes[plane].length = plane_sizes[plane];
>>> vb->planes[plane].min_length = plane_sizes[plane];
>>> @@ -891,20 +892,12 @@ void *vb2_plane_cookie(struct vb2_buffer *vb, unsigned int plane_no)
>>> }
>>> EXPORT_SYMBOL_GPL(vb2_plane_cookie);
>>>
>>> -void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
>>> +static void vb2_process_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
>>> {
>>> struct vb2_queue *q = vb->vb2_queue;
>>> unsigned long flags;
>>> unsigned int plane;
>>>
>>> - if (WARN_ON(vb->state != VB2_BUF_STATE_ACTIVE))
>>> - return;
>>> -
>>> - if (WARN_ON(state != VB2_BUF_STATE_DONE &&
>>> - state != VB2_BUF_STATE_ERROR &&
>>> - state != VB2_BUF_STATE_QUEUED &&
>>> - state != VB2_BUF_STATE_REQUEUEING))
>>> - state = VB2_BUF_STATE_ERROR;
>>>
>>> #ifdef CONFIG_VIDEO_ADV_DEBUG
>>> /*
>>> @@ -921,6 +914,9 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
>>> call_void_memop(vb, finish, vb->planes[plane].mem_priv);
>>>
>>> spin_lock_irqsave(&q->done_lock, flags);
>>> + if (vb->state == VB2_BUF_STATE_ACTIVE)
>>> + atomic_dec(&q->owned_by_drv_count);
>>> +
>>> if (state == VB2_BUF_STATE_QUEUED ||
>>> state == VB2_BUF_STATE_REQUEUEING) {
>>> vb->state = VB2_BUF_STATE_QUEUED;
>>> @@ -929,7 +925,7 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
>>> list_add_tail(&vb->done_entry, &q->done_list);
>>> vb->state = state;
>>> }
>>> - atomic_dec(&q->owned_by_drv_count);
>>> +
>>> spin_unlock_irqrestore(&q->done_lock, flags);
>>>
>>> trace_vb2_buf_done(q, vb);
>>> @@ -946,6 +942,36 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
>>> wake_up(&q->done_wq);
>>> break;
>>> }
>>> +
>>> +}
>>> +
>>> +void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
>>> +{
>>> + if (WARN_ON(vb->state != VB2_BUF_STATE_ACTIVE))
>>> + return;
>>> +
>>> + if (WARN_ON(state != VB2_BUF_STATE_DONE &&
>>> + state != VB2_BUF_STATE_ERROR &&
>>> + state != VB2_BUF_STATE_QUEUED &&
>>> + state != VB2_BUF_STATE_REQUEUEING))
>>> + state = VB2_BUF_STATE_ERROR;
>>> +
>>> + vb2_process_buffer_done(vb, state);
>>> +
>>> + /*
>>> + * Check if there is any buffer with error in the next position of the queue,
>>> + * buffers whose in-fence signaled with error are not queued to the driver
>>> + * and kept on the queue until the buffer before them is done, so to not
>>> + * delivery buffers back to userspace in the wrong order. Here we process
>>
>> delivery -> deliver
>>
>>> + * any existing buffers with errors and wake up userspace.
>>> + */
>>> + for (;;) {
>>> + vb = list_next_entry(vb, queued_entry);
>>> + if (!vb || vb->state != VB2_BUF_STATE_ERROR)
>>> + break;
>>> +
>>> + vb2_process_buffer_done(vb, VB2_BUF_STATE_ERROR);
>>> + }
>>> }
>>> EXPORT_SYMBOL_GPL(vb2_buffer_done);
>>>
>>> @@ -1230,6 +1256,9 @@ static void __enqueue_in_driver(struct vb2_buffer *vb)
>>> {
>>> struct vb2_queue *q = vb->vb2_queue;
>>>
>>> + if (vb->in_fence && !dma_fence_is_signaled(vb->in_fence))
>>> + return;
>>> +
>>> vb->state = VB2_BUF_STATE_ACTIVE;
>>> atomic_inc(&q->owned_by_drv_count);
>>>
>>> @@ -1281,6 +1310,24 @@ static int __buf_prepare(struct vb2_buffer *vb, const void *pb)
>>> return 0;
>>> }
>>>
>>> +static int __get_num_ready_buffers(struct vb2_queue *q)
>>> +{
>>> + struct vb2_buffer *vb;
>>> + int ready_count = 0;
>>> + unsigned long flags;
>>> +
>>> + /* count num of buffers ready in front of the queued_list */
>>> + list_for_each_entry(vb, &q->queued_list, queued_entry) {
>>> + spin_lock_irqsave(&vb->fence_cb_lock, flags);
>>> + if (vb->in_fence && !dma_fence_is_signaled(vb->in_fence))
>>> + break;
>>> + ready_count++;
>>> + spin_unlock_irqrestore(&vb->fence_cb_lock, flags);
>>> + }
>>> +
>>> + return ready_count;
>>> +}
>>> +
>>> int vb2_core_prepare_buf(struct vb2_queue *q, unsigned int index, void *pb)
>>> {
>>> struct vb2_buffer *vb;
>>> @@ -1369,9 +1416,43 @@ static int vb2_start_streaming(struct vb2_queue *q)
>>> return ret;
>>> }
>>>
>>> -int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb)
>>> +static void vb2_qbuf_fence_cb(struct dma_fence *f, struct dma_fence_cb *cb)
>>> +{
>>> + struct vb2_buffer *vb = container_of(cb, struct vb2_buffer, fence_cb);
>>> + struct vb2_queue *q = vb->vb2_queue;
>>> + unsigned long flags;
>>> +
>>> + spin_lock_irqsave(&vb->fence_cb_lock, flags);
>>> + /*
>>> + * If the fence signals with an error we mark the buffer as such
>>> + * and avoid using it by setting it to VB2_BUF_STATE_ERROR and
>>> + * not queueing it to the driver. However we can't notify the error
>>> + * to userspace right now because, at the time this callback run, QBUF
>>> + * returned already.
>>
>> returned -> has returned
>>
>>> + * So we delay that to DQBUF time. See comments in vb2_buffer_done()
>>> + * as well.
>>> + */
>>> + if (vb->in_fence->error)
>>> + vb->state = VB2_BUF_STATE_ERROR;
>>> +
>>> + dma_fence_put(vb->in_fence);
>>> + vb->in_fence = NULL;
>>> +
>>> + if (vb->state == VB2_BUF_STATE_ERROR) {
>>> + spin_unlock_irqrestore(&vb->fence_cb_lock, flags);
>>> + return;
>>> + }
>>> +
>>> + if (q->start_streaming_called)
>>> + __enqueue_in_driver(vb);
>>> + spin_unlock_irqrestore(&vb->fence_cb_lock, flags);
>>> +}
>>> +
>>> +int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb,
>>> + struct dma_fence *in_fence)
>>> {
>>> struct vb2_buffer *vb;
>>> + unsigned long flags;
>>> int ret;
>>>
>>> vb = q->bufs[index];
>>> @@ -1380,16 +1461,18 @@ int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb)
>>> case VB2_BUF_STATE_DEQUEUED:
>>> ret = __buf_prepare(vb, pb);
>>> if (ret)
>>> - return ret;
>>> + goto err;
>>> break;
>>> case VB2_BUF_STATE_PREPARED:
>>> break;
>>> case VB2_BUF_STATE_PREPARING:
>>> dprintk(1, "buffer still being prepared\n");
>>> - return -EINVAL;
>>> + ret = -EINVAL;
>>> + goto err;
>>> default:
>>> dprintk(1, "invalid buffer state %d\n", vb->state);
>>> - return -EINVAL;
>>> + ret = -EINVAL;
>>> + goto err;
>>> }
>>>
>>> /*
>>> @@ -1400,6 +1483,7 @@ int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb)
>>> q->queued_count++;
>>> q->waiting_for_buffers = false;
>>> vb->state = VB2_BUF_STATE_QUEUED;
>>> + vb->in_fence = in_fence;
>>>
>>> if (pb)
>>> call_void_bufop(q, copy_timestamp, vb, pb);
>>> @@ -1407,15 +1491,40 @@ int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb)
>>> trace_vb2_qbuf(q, vb);
>>>
>>> /*
>>> - * If already streaming, give the buffer to driver for processing.
>>> - * If not, the buffer will be given to driver on next streamon.
>>> + * For explicit synchronization: If the fence didn't signal
>>> + * yet we setup a callback to queue the buffer once the fence
>>> + * signals, and then, return successfully. But if the fence
>>
>> , and then, -> and then (no commas)
>>
>>> + * already signaled we lose the reference we held and queue the
>>> + * buffer to the driver.
>>> */
>>> - if (q->start_streaming_called)
>>> - __enqueue_in_driver(vb);
>>> + spin_lock_irqsave(&vb->fence_cb_lock, flags);
>>> + if (vb->in_fence) {
>>> + ret = dma_fence_add_callback(vb->in_fence, &vb->fence_cb,
>>> + vb2_qbuf_fence_cb);
>>> + /* is the fence signaled? */
>>> + if (ret == -ENOENT) {
>>> + dma_fence_put(vb->in_fence);
>>> + vb->in_fence = NULL;
>>> + } else if (ret) {
>>> + goto unlock;
>>> + }
>>> + }
>>>
>>> - /* Fill buffer information for the userspace */
>>> - if (pb)
>>> - call_void_bufop(q, fill_user_buffer, vb, pb);
>>> + /*
>>> + * If already streaming and there is no fence to wait on
>>> + * give the buffer to driver for processing.
>>> + */
>>> + if (q->start_streaming_called) {
>>> + struct vb2_buffer *b;
>>> +
>>> + list_for_each_entry(b, &q->queued_list, queued_entry) {
>>> + if (b->state != VB2_BUF_STATE_QUEUED)
>>> + continue;
>>> + if (b->in_fence)
>>> + break;
>>> + __enqueue_in_driver(b);
>>> + }
>>> + }
>>>
>>> /*
>>> * If streamon has been called, and we haven't yet called
>>> @@ -1424,14 +1533,36 @@ int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb)
>>> * then we can finally call start_streaming().
>>> */
>>> if (q->streaming && !q->start_streaming_called &&
>>> - q->queued_count >= q->min_buffers_needed) {
>>> + __get_num_ready_buffers(q) >= q->min_buffers_needed) {
>>> ret = vb2_start_streaming(q);
>>> if (ret)
>>> - return ret;
>>> + goto unlock;
>>> }
>>>
>>> + spin_unlock_irqrestore(&vb->fence_cb_lock, flags);
>>> +
>>> + /* Fill buffer information for the userspace */
>>> + if (pb)
>>> + call_void_bufop(q, fill_user_buffer, vb, pb);
>>> +
>>> dprintk(2, "qbuf of buffer %d succeeded\n", vb->index);
>>> return 0;
>>> +
>>> +unlock:
>>> + spin_unlock_irqrestore(&vb->fence_cb_lock, flags);
>>> +
>>> +err:
>>> + /* Fill buffer information for the userspace */
>>> + if (pb)
>>> + call_void_bufop(q, fill_user_buffer, vb, pb);
>>> +
>>> + if (vb->in_fence) {
>>> + dma_fence_put(vb->in_fence);
>>> + vb->in_fence = NULL;
>>> + }
>>> +
>>> + return ret;
>>> +
>>> }
>>> EXPORT_SYMBOL_GPL(vb2_core_qbuf);
>>>
>>> @@ -1642,6 +1773,8 @@ EXPORT_SYMBOL_GPL(vb2_core_dqbuf);
>>> static void __vb2_queue_cancel(struct vb2_queue *q)
>>> {
>>> unsigned int i;
>>> + struct vb2_buffer *vb;
>>> + unsigned long flags;
>>>
>>> /*
>>> * Tell driver to stop all transactions and release all queued
>>> @@ -1672,6 +1805,16 @@ static void __vb2_queue_cancel(struct vb2_queue *q)
>>> q->queued_count = 0;
>>> q->error = 0;
>>>
>>> + list_for_each_entry(vb, &q->queued_list, queued_entry) {
>>> + spin_lock_irqsave(&vb->fence_cb_lock, flags);
>>> + if (vb->in_fence) {
>>> + dma_fence_remove_callback(vb->in_fence, &vb->fence_cb);
>>> + dma_fence_put(vb->in_fence);
>>> + vb->in_fence = NULL;
>>> + }
>>> + spin_unlock_irqrestore(&vb->fence_cb_lock, flags);
>>> + }
>>> +
>>> /*
>>> * Remove all buffers from videobuf's list...
>>> */
>>> @@ -1742,7 +1885,7 @@ int vb2_core_streamon(struct vb2_queue *q, unsigned int type)
>>> * Tell driver to start streaming provided sufficient buffers
>>> * are available.
>>> */
>>> - if (q->queued_count >= q->min_buffers_needed) {
>>> + if (__get_num_ready_buffers(q) >= q->min_buffers_needed) {
>>> ret = v4l_vb2q_enable_media_source(q);
>>> if (ret)
>>> return ret;
>>> @@ -2264,7 +2407,7 @@ static int __vb2_init_fileio(struct vb2_queue *q, int read)
>>> * Queue all buffers.
>>> */
>>> for (i = 0; i < q->num_buffers; i++) {
>>> - ret = vb2_core_qbuf(q, i, NULL);
>>> + ret = vb2_core_qbuf(q, i, NULL, NULL);
>>> if (ret)
>>> goto err_reqbufs;
>>> fileio->bufs[i].queued = 1;
>>> @@ -2443,7 +2586,7 @@ static size_t __vb2_perform_fileio(struct vb2_queue *q, char __user *data, size_
>>>
>>> if (copy_timestamp)
>>> b->timestamp = ktime_get_ns();
>>> - ret = vb2_core_qbuf(q, index, NULL);
>>> + ret = vb2_core_qbuf(q, index, NULL, NULL);
>>> dprintk(5, "vb2_dbuf result: %d\n", ret);
>>> if (ret)
>>> return ret;
>>> @@ -2546,7 +2689,7 @@ static int vb2_thread(void *data)
>>> if (copy_timestamp)
>>> vb->timestamp = ktime_get_ns();
>>> if (!threadio->stop)
>>> - ret = vb2_core_qbuf(q, vb->index, NULL);
>>> + ret = vb2_core_qbuf(q, vb->index, NULL, NULL);
>>> call_void_qop(q, wait_prepare, q);
>>> if (ret || threadio->stop)
>>> break;
>>> diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c b/drivers/media/common/videobuf2/videobuf2-v4l2.c
>>> index ad1e032c3bf5..1df5dd01c0cd 100644
>>> --- a/drivers/media/common/videobuf2/videobuf2-v4l2.c
>>> +++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c
>>> @@ -23,6 +23,7 @@
>>> #include <linux/sched.h>
>>> #include <linux/freezer.h>
>>> #include <linux/kthread.h>
>>> +#include <linux/sync_file.h>
>>>
>>> #include <media/v4l2-dev.h>
>>> #include <media/v4l2-fh.h>
>>> @@ -178,6 +179,17 @@ static int vb2_queue_or_prepare_buf(struct vb2_queue *q, struct v4l2_buffer *b,
>>> return -EINVAL;
>>> }
>>>
>>> + if ((b->fence_fd != 0 && b->fence_fd != -1) &&
>>> + !(b->flags & V4L2_BUF_FLAG_IN_FENCE)) {
>>> + dprintk(1, "%s: fence_fd set without IN_FENCE flag\n", opname);
>>> + return -EINVAL;
>>> + }
>>> +
>>> + if (b->fence_fd < 0 && (b->flags & V4L2_BUF_FLAG_IN_FENCE)) {
>>> + dprintk(1, "%s: IN_FENCE flag set but no fence_fd\n", opname);
>>> + return -EINVAL;
>>> + }
>>> +
>>> return __verify_planes_array(q->bufs[b->index], b);
>>> }
>>>
>>> @@ -203,9 +215,14 @@ static void __fill_v4l2_buffer(struct vb2_buffer *vb, void *pb)
>>> b->timestamp = ns_to_timeval(vb->timestamp);
>>> b->timecode = vbuf->timecode;
>>> b->sequence = vbuf->sequence;
>>> - b->fence_fd = 0;
>>> b->reserved = 0;
>>>
>>> + b->fence_fd = 0;
>>> + if (vb->in_fence)
>>> + b->flags |= V4L2_BUF_FLAG_IN_FENCE;
>>> + else
>>> + b->flags &= ~V4L2_BUF_FLAG_IN_FENCE;
>>> +
>>
>> I was wondering: if the in-fence was already triggered this flag is not set because
>> vb->in_fence will be NULL. Would it be useful to add a vb2 flag that is set when the
>> in-fence triggered and report that here somehow? It might be useful for debugging.
>>
>> Just wondering.
>>
>
> I'm also wondering :-) ... does it make sense to use a flag for this?
>
> #define V4L2_BUF_FLAG_IN_FENCE 0x00200000
> #define V4L2_BUF_FLAG_OUT_FENCE 0x00400000
> #define V4L2_BUF_FLAG_IN_FENCE_SIGNALED 0x00800000
>
> Aren't we using a very limited resource? Is it worry?

No, not a worry. It is very, very likely that there will be a new set of streaming ioctls
anyway (I have patches pending and will continue with that once fences and request API
are in). At that point there will be a new v4l2_buffer struct with more room. And besides,
there are still 8 bits left after this.

Regards,

Hans

>
> Thanks,
>