Re: [PATCH v2 0/6] media: cedrus: h264: Support multi-slice frames

From: Hans Verkuil
Date: Wed Oct 09 2019 - 06:18:59 EST


On 10/7/19 9:01 PM, Jernej Åkrabec wrote:
> Dne ponedeljek, 07. oktober 2019 ob 12:44:24 CEST je Hans Verkuil napisal(a):
>> Hi Jernej,
>>
>> On 9/29/19 10:00 PM, Jernej Skrabec wrote:
>>> This series adds support for decoding multi-slice H264 frames along with
>>> support for V4L2_DEC_CMD_FLUSH and V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF.
>>>
>>> Code was tested by modified ffmpeg, which can be found here:
>>> https://github.com/jernejsk/FFmpeg, branch mainline-test
>>> It has to be configured with at least following options:
>>> --enable-v4l2-request --enable-libudev --enable-libdrm
>>>
>>> Samples used for testing:
>>> http://jernej.libreelec.tv/videos/h264/BA1_FT_C.mp4
>>> http://jernej.libreelec.tv/videos/h264/h264.mp4
>>>
>>> Command line used for testing:
>>> ffmpeg -hwaccel drm -hwaccel_device /dev/dri/card0 -i h264.mp4 -pix_fmt
>>> bgra -f fbdev /dev/fb0
>>>
>>> Please note that V4L2_DEC_CMD_FLUSH was not tested because I'm
>>> not sure how. ffmpeg follows exactly which slice is last in frame
>>> and sets hold flag accordingly. Improper usage of hold flag would
>>> corrupt ffmpeg assumptions and it would probably crash. Any ideas
>>> how to test this are welcome!
>>>
>>> Thanks to Jonas for adjusting ffmpeg.
>>>
>>> Please let me know what you think.
>>>
>>> Best regards,
>>> Jernej
>>>
>>> Changes from v1:
>>> - added Rb tags
>>> - updated V4L2_DEC_CMD_FLUSH documentation
>>> - updated first slice detection in Cedrus
>>> - hold capture buffer flag is set according to source format
>>> - added v4l m2m stateless_(try_)decoder_cmd ioctl helpers
>>>
>>> Hans Verkuil (2):
>>> vb2: add V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF
>>> videodev2.h: add V4L2_DEC_CMD_FLUSH
>>>
>>> Jernej Skrabec (4):
>>> media: v4l2-mem2mem: add stateless_(try_)decoder_cmd ioctl helpers
>>> media: cedrus: Detect first slice of a frame
>>> media: cedrus: h264: Support multiple slices per frame
>>> media: cedrus: Add support for holding capture buffer
>>>
>>> Documentation/media/uapi/v4l/buffer.rst | 13 ++++++
>>> .../media/uapi/v4l/vidioc-decoder-cmd.rst | 10 +++-
>>> .../media/uapi/v4l/vidioc-reqbufs.rst | 6 +++
>>> .../media/videodev2.h.rst.exceptions | 1 +
>>> .../media/common/videobuf2/videobuf2-v4l2.c | 8 +++-
>>> drivers/media/v4l2-core/v4l2-mem2mem.c | 35 ++++++++++++++
>>> drivers/staging/media/sunxi/cedrus/cedrus.h | 1 +
>>> .../staging/media/sunxi/cedrus/cedrus_dec.c | 11 +++++
>>> .../staging/media/sunxi/cedrus/cedrus_h264.c | 11 ++++-
>>> .../staging/media/sunxi/cedrus/cedrus_hw.c | 8 ++--
>>> .../staging/media/sunxi/cedrus/cedrus_video.c | 14 ++++++
>>> include/media/v4l2-mem2mem.h | 46 +++++++++++++++++++
>>> include/media/videobuf2-core.h | 3 ++
>>> include/media/videobuf2-v4l2.h | 5 ++
>>> include/uapi/linux/videodev2.h | 14 ++++--
>>> 15 files changed, 175 insertions(+), 11 deletions(-)
>>
>> I didn't want to make a v3 of this series, instead I hacked this patch that
>> will hopefully do all the locking right.
>>
>> Basically I moved all the 'held' related code into v4l2-mem2mem under
>> job_spinlock. This simplifies the driver code as well.
>>
>> But this is a hack that sits on top of this series. If your ffmpeg tests are
>> now successful, then I'll turn this into a proper series with correct
>> documentation (a lot of the comments are now wrong with this patch, so just
>> ignore that).
>
> Thanks for looking into this! With small fix mentioned below, it works! Note
> that both scenarios I tested (flushing during decoding and flushing after
> decoding is finished) are focused on capture queue. In order to trigger output
> queue flush, ffmpeg would need to queue multiple jobs and call flush before they
> are all processed. This is not something I can do at this time. Maybe Jonas
> can help with modifying ffmpeg appropriately. However, code for case seems
> correct to me.
>
>>
>> Regards,
>>
>> Hans
>>
>> diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c
>> b/drivers/media/v4l2-core/v4l2-mem2mem.c index 2677a07e4c9b..f81a8f2465ab
>> 100644
>> --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
>> +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
>> @@ -412,25 +412,24 @@ static void v4l2_m2m_cancel_job(struct v4l2_m2m_ctx
>> *m2m_ctx) }
>> }
>>
>> -void v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev,
>> - struct v4l2_m2m_ctx *m2m_ctx)
>> +static bool _v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev,
>> + struct v4l2_m2m_ctx *m2m_ctx)
>> {
>> - unsigned long flags;
>> -
>> - spin_lock_irqsave(&m2m_dev->job_spinlock, flags);
>> if (!m2m_dev->curr_ctx || m2m_dev->curr_ctx != m2m_ctx) {
>> - spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
>> dprintk("Called by an instance not currently
> running\n");
>> - return;
>> + return false;
>> }
>>
>> list_del(&m2m_dev->curr_ctx->queue);
>> m2m_dev->curr_ctx->job_flags &= ~(TRANS_QUEUED | TRANS_RUNNING);
>> wake_up(&m2m_dev->curr_ctx->finished);
>> m2m_dev->curr_ctx = NULL;
>> + return true;
>> +}
>>
>> - spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
>> -
>> +static void v4l2_m2m_job_next(struct v4l2_m2m_dev *m2m_dev,
>> + struct v4l2_m2m_ctx *m2m_ctx)
>> +{
>> /* This instance might have more buffers ready, but since we do not
>> * allow more than one job on the job_queue per instance, each has
>> * to be scheduled separately after the previous one finishes. */
>> @@ -441,8 +440,113 @@ void v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev,
>> */
>> schedule_work(&m2m_dev->job_work);
>> }
>> +
>> +void v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev,
>> + struct v4l2_m2m_ctx *m2m_ctx)
>> +{
>> + unsigned long flags;
>> +
>> + spin_lock_irqsave(&m2m_dev->job_spinlock, flags);
>> + if (!_v4l2_m2m_job_finish(m2m_dev, m2m_ctx)) {
>> + spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
>> + return;
>> + }
>> + spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
>> +
>> + v4l2_m2m_job_next(m2m_dev, m2m_ctx);
>> +}
>> EXPORT_SYMBOL(v4l2_m2m_job_finish);
>>
>> +void v4l2_m2m_job_finish_held(struct v4l2_m2m_dev *m2m_dev,
>> + struct v4l2_m2m_ctx *m2m_ctx,
>> + enum vb2_buffer_state state)
>> +{
>> + struct vb2_v4l2_buffer *src_buf, *dst_buf;
>> + unsigned long flags;
>> +
>> + spin_lock_irqsave(&m2m_dev->job_spinlock, flags);
>> + src_buf = v4l2_m2m_src_buf_remove(m2m_ctx);
>> + dst_buf = v4l2_m2m_next_dst_buf(m2m_ctx);
>> +
>> + if (!src_buf || !dst_buf) {
>> + pr_err("Missing source and/or destination buffers\n");
>> + spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
>> + return;
>> + }
>> + v4l2_m2m_buf_done(src_buf, state);
>> + if (!dst_buf->is_held) {
>> + v4l2_m2m_dst_buf_remove(m2m_ctx);
>> + v4l2_m2m_buf_done(dst_buf, state);
>> + }
>> + if (!_v4l2_m2m_job_finish(m2m_dev, m2m_ctx)) {
>> + spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
>> + return;
>> + }
>> + spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
>> +
>> + v4l2_m2m_job_next(m2m_dev, m2m_ctx);
>> +}
>> +EXPORT_SYMBOL(v4l2_m2m_job_finish_held);
>> +
>> +/**
>> + * v4l2_m2m_release_capture_buf() - check if the capture buffer should be
>> + * released
>> + *
>> + * @out_vb: the output buffer
>> + * @cap_vb: the capture buffer
>> + *
>> + * This helper function returns true if the current capture buffer should
>> + * be released to vb2. This is the case if the output buffer specified that
>> + * the capture buffer should be held (i.e. not returned to vb2) AND if the
>> + * timestamp of the capture buffer differs from the output buffer
>> timestamp. + *
>> + * This helper is to be called at the start of the device_run callback:
>> + *
>> + * .. code-block:: c
>> + *
>> + * if (v4l2_m2m_release_capture_buf(out_vb, cap_vb)) {
>> + * v4l2_m2m_dst_buf_remove(m2m_ctx);
>> + * v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE);
>> + * cap_vb = v4l2_m2m_next_dst_buf(m2m_ctx);
>> + * }
>> + * cap_vb->is_held = out_vb->flags &
> V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
>> + *
>> + * ...
>> + *
>> + * v4l2_m2m_buf_done(out_vb, VB2_BUF_STATE_DONE);
>> + * if (!cap_vb->is_held) {
>> + * v4l2_m2m_dst_buf_remove(m2m_ctx);
>> + * v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE);
>> + * }
>> + *
>> + * This allows for multiple output buffers to be used to fill in a single
>> + * capture buffer. This is typically used by stateless decoders where
>> + * multiple e.g. H.264 slices contribute to a single decoded frame.
>> + */
>> +struct vb2_v4l2_buffer *v4l2_m2m_release_capture_buf(struct v4l2_m2m_ctx
>> *m2m_ctx) +{
>> + struct v4l2_m2m_dev *m2m_dev = m2m_ctx->m2m_dev;
>> + struct vb2_v4l2_buffer *src, *dst;
>> + unsigned long flags;
>> +
>> + spin_lock_irqsave(&m2m_dev->job_spinlock, flags);
>> + src = v4l2_m2m_next_src_buf(m2m_ctx);
>> + dst = v4l2_m2m_next_dst_buf(m2m_ctx);
>> +
>> + if (dst->is_held && dst->vb2_buf.copied_timestamp &&
>> + src->vb2_buf.timestamp != dst->vb2_buf.timestamp) {
>> + dst->is_held = false;
>> + v4l2_m2m_dst_buf_remove(m2m_ctx);
>> + v4l2_m2m_buf_done(dst, VB2_BUF_STATE_DONE);
>> + dst = v4l2_m2m_next_dst_buf(m2m_ctx);
>> + }
>> + dst->is_held = src->flags & V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
>> + src->flags &= ~V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
>> + spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
>> + return dst;
>> +}
>> +EXPORT_SYMBOL(v4l2_m2m_release_capture_buf);
>> +
>> int v4l2_m2m_reqbufs(struct file *file, struct v4l2_m2m_ctx *m2m_ctx,
>> struct v4l2_requestbuffers *reqbufs)
>> {
>> @@ -1171,19 +1275,28 @@ int v4l2_m2m_ioctl_stateless_decoder_cmd(struct file
>> *file, void *priv, {
>> struct v4l2_fh *fh = file->private_data;
>> struct vb2_v4l2_buffer *out_vb, *cap_vb;
>> + struct v4l2_m2m_dev *m2m_dev = fh->m2m_ctx->m2m_dev;
>> + unsigned long flags;
>> int ret;
>>
>> ret = v4l2_m2m_ioctl_stateless_try_decoder_cmd(file, priv, dc);
>> if (ret < 0)
>> return ret;
>>
>> + spin_lock_irqsave(&m2m_dev->job_spinlock, flags);
>> out_vb = v4l2_m2m_last_src_buf(fh->m2m_ctx);
>> cap_vb = v4l2_m2m_last_dst_buf(fh->m2m_ctx);
>>
>> - if (out_vb)
>> + if (out_vb && (out_vb->flags & V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF))
> {
>> out_vb->flags &= ~V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
>> - else if (cap_vb && cap_vb->is_held)
>> - v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE);
>> + } else if (cap_vb && cap_vb->is_held) {
>> + cap_vb->is_held = false;
>> + if (m2m_dev->curr_ctx) {
>
> Above condition should be negated.

Close. It should check that this buffer isn't currently being processed.
So:

if (m2m_dev->curr_ctx != fh->m2m_ctx) {

Can you test with this change? If this works, then I'll post a proper
series for this.

Thanks!

Hans

>
> Best regards,
> Jernej
>
>> + v4l2_m2m_dst_buf_remove(fh->m2m_ctx);
>> + v4l2_m2m_buf_done(cap_vb,
> VB2_BUF_STATE_DONE);
>> + }
>> + }
>> + spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
>>
>> return 0;
>> }
>> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
>> b/drivers/staging/media/sunxi/cedrus/cedrus_dec.c index
>> 67f7d4326fc1..4e30f263b427 100644
>> --- a/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
>> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
>> @@ -30,14 +30,7 @@ void cedrus_device_run(void *priv)
>> struct media_request *src_req;
>>
>> run.src = v4l2_m2m_next_src_buf(ctx->fh.m2m_ctx);
>> - run.dst = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
>> -
>> - if (v4l2_m2m_release_capture_buf(run.src, run.dst)) {
>> - v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
>> - v4l2_m2m_buf_done(run.dst, VB2_BUF_STATE_DONE);
>> - run.dst = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
>> - }
>> - run.dst->is_held = run.src->flags &
> V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
>> + run.dst = v4l2_m2m_release_capture_buf(ctx->fh.m2m_ctx);
>>
>> run.first_slice = !run.dst->vb2_buf.copied_timestamp ||
>> run.src->vb2_buf.timestamp != run.dst-
>> vb2_buf.timestamp;
>> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
>> b/drivers/staging/media/sunxi/cedrus/cedrus_hw.c index
>> 99fedec80224..242cad82cc8c 100644
>> --- a/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
>> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
>> @@ -103,7 +103,6 @@ static irqreturn_t cedrus_irq(int irq, void *data)
>> {
>> struct cedrus_dev *dev = data;
>> struct cedrus_ctx *ctx;
>> - struct vb2_v4l2_buffer *src_buf, *dst_buf;
>> enum vb2_buffer_state state;
>> enum cedrus_irq_status status;
>>
>> @@ -121,26 +120,12 @@ static irqreturn_t cedrus_irq(int irq, void *data)
>> dev->dec_ops[ctx->current_codec]->irq_disable(ctx);
>> dev->dec_ops[ctx->current_codec]->irq_clear(ctx);
>>
>> - src_buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
>> - dst_buf = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
>> -
>> - if (!src_buf || !dst_buf) {
>> - v4l2_err(&dev->v4l2_dev,
>> - "Missing source and/or destination
> buffers\n");
>> - return IRQ_HANDLED;
>> - }
>> -
>> if (status == CEDRUS_IRQ_ERROR)
>> state = VB2_BUF_STATE_ERROR;
>> else
>> state = VB2_BUF_STATE_DONE;
>>
>> - v4l2_m2m_buf_done(src_buf, state);
>> - if (!dst_buf->is_held) {
>> - v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
>> - v4l2_m2m_buf_done(dst_buf, state);
>> - }
>> - v4l2_m2m_job_finish(ctx->dev->m2m_dev, ctx->fh.m2m_ctx);
>> + v4l2_m2m_job_finish_held(ctx->dev->m2m_dev, ctx->fh.m2m_ctx, state);
>>
>> return IRQ_HANDLED;
>> }
>> diff --git a/include/media/v4l2-mem2mem.h b/include/media/v4l2-mem2mem.h
>> index 8ae2f56c7fa3..48ca7d3eaa3d 100644
>> --- a/include/media/v4l2-mem2mem.h
>> +++ b/include/media/v4l2-mem2mem.h
>> @@ -173,6 +173,10 @@ void v4l2_m2m_try_schedule(struct v4l2_m2m_ctx
>> *m2m_ctx); void v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev,
>> struct v4l2_m2m_ctx *m2m_ctx);
>>
>> +void v4l2_m2m_job_finish_held(struct v4l2_m2m_dev *m2m_dev,
>> + struct v4l2_m2m_ctx *m2m_ctx,
>> + enum vb2_buffer_state state);
>> +
>> static inline void
>> v4l2_m2m_buf_done(struct vb2_v4l2_buffer *buf, enum vb2_buffer_state state)
>> {
>> @@ -644,47 +648,7 @@ void v4l2_m2m_buf_copy_metadata(const struct
>> vb2_v4l2_buffer *out_vb, struct vb2_v4l2_buffer *cap_vb,
>> bool copy_frame_flags);
>>
>> -/**
>> - * v4l2_m2m_release_capture_buf() - check if the capture buffer should be
>> - * released
>> - *
>> - * @out_vb: the output buffer
>> - * @cap_vb: the capture buffer
>> - *
>> - * This helper function returns true if the current capture buffer should
>> - * be released to vb2. This is the case if the output buffer specified that
>> - * the capture buffer should be held (i.e. not returned to vb2) AND if the
>> - * timestamp of the capture buffer differs from the output buffer
>> timestamp. - *
>> - * This helper is to be called at the start of the device_run callback:
>> - *
>> - * .. code-block:: c
>> - *
>> - * if (v4l2_m2m_release_capture_buf(out_vb, cap_vb)) {
>> - * v4l2_m2m_dst_buf_remove(m2m_ctx);
>> - * v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE);
>> - * cap_vb = v4l2_m2m_next_dst_buf(m2m_ctx);
>> - * }
>> - * cap_vb->is_held = out_vb->flags &
> V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
>> - *
>> - * ...
>> - *
>> - * v4l2_m2m_buf_done(out_vb, VB2_BUF_STATE_DONE);
>> - * if (!cap_vb->is_held) {
>> - * v4l2_m2m_dst_buf_remove(m2m_ctx);
>> - * v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE);
>> - * }
>> - *
>> - * This allows for multiple output buffers to be used to fill in a single
>> - * capture buffer. This is typically used by stateless decoders where
>> - * multiple e.g. H.264 slices contribute to a single decoded frame.
>> - */
>> -static inline bool v4l2_m2m_release_capture_buf(const struct
>> vb2_v4l2_buffer *out_vb, -
> const struct vb2_v4l2_buffer *cap_vb)
>> -{
>> - return cap_vb->is_held && cap_vb->vb2_buf.copied_timestamp &&
>> - out_vb->vb2_buf.timestamp != cap_vb->vb2_buf.timestamp;
>> -}
>> +struct vb2_v4l2_buffer *v4l2_m2m_release_capture_buf(struct v4l2_m2m_ctx
>> *m2m_ctx);
>>
>> /* v4l2 request helper */
>
>
>
>