Re: [PATCH v2 0/6] media: cedrus: h264: Support multi-slice frames

From: Jernej Åkrabec
Date: Wed Oct 09 2019 - 10:44:41 EST


Dne sreda, 09. oktober 2019 ob 12:18:45 CEST je Hans Verkuil napisal(a):
> On 10/7/19 9:01 PM, Jernej Åkrabec wrote:
> > Dne ponedeljek, 07. oktober 2019 ob 12:44:24 CEST je Hans Verkuil
napisal(a):
> >> Hi Jernej,
> >>
> >> On 9/29/19 10:00 PM, Jernej Skrabec wrote:
> >>> This series adds support for decoding multi-slice H264 frames along with
> >>> support for V4L2_DEC_CMD_FLUSH and V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF.
> >>>
> >>> Code was tested by modified ffmpeg, which can be found here:
> >>> https://github.com/jernejsk/FFmpeg, branch mainline-test
> >>> It has to be configured with at least following options:
> >>> --enable-v4l2-request --enable-libudev --enable-libdrm
> >>>
> >>> Samples used for testing:
> >>> http://jernej.libreelec.tv/videos/h264/BA1_FT_C.mp4
> >>> http://jernej.libreelec.tv/videos/h264/h264.mp4
> >>>
> >>> Command line used for testing:
> >>> ffmpeg -hwaccel drm -hwaccel_device /dev/dri/card0 -i h264.mp4 -pix_fmt
> >>> bgra -f fbdev /dev/fb0
> >>>
> >>> Please note that V4L2_DEC_CMD_FLUSH was not tested because I'm
> >>> not sure how. ffmpeg follows exactly which slice is last in frame
> >>> and sets hold flag accordingly. Improper usage of hold flag would
> >>> corrupt ffmpeg assumptions and it would probably crash. Any ideas
> >>> how to test this are welcome!
> >>>
> >>> Thanks to Jonas for adjusting ffmpeg.
> >>>
> >>> Please let me know what you think.
> >>>
> >>> Best regards,
> >>> Jernej
> >>>
> >>> Changes from v1:
> >>> - added Rb tags
> >>> - updated V4L2_DEC_CMD_FLUSH documentation
> >>> - updated first slice detection in Cedrus
> >>> - hold capture buffer flag is set according to source format
> >>> - added v4l m2m stateless_(try_)decoder_cmd ioctl helpers
> >>>
> >>> Hans Verkuil (2):
> >>> vb2: add V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF
> >>> videodev2.h: add V4L2_DEC_CMD_FLUSH
> >>>
> >>> Jernej Skrabec (4):
> >>> media: v4l2-mem2mem: add stateless_(try_)decoder_cmd ioctl helpers
> >>> media: cedrus: Detect first slice of a frame
> >>> media: cedrus: h264: Support multiple slices per frame
> >>> media: cedrus: Add support for holding capture buffer
> >>>
> >>> Documentation/media/uapi/v4l/buffer.rst | 13 ++++++
> >>> .../media/uapi/v4l/vidioc-decoder-cmd.rst | 10 +++-
> >>> .../media/uapi/v4l/vidioc-reqbufs.rst | 6 +++
> >>> .../media/videodev2.h.rst.exceptions | 1 +
> >>> .../media/common/videobuf2/videobuf2-v4l2.c | 8 +++-
> >>> drivers/media/v4l2-core/v4l2-mem2mem.c | 35 ++++++++++++++
> >>> drivers/staging/media/sunxi/cedrus/cedrus.h | 1 +
> >>> .../staging/media/sunxi/cedrus/cedrus_dec.c | 11 +++++
> >>> .../staging/media/sunxi/cedrus/cedrus_h264.c | 11 ++++-
> >>> .../staging/media/sunxi/cedrus/cedrus_hw.c | 8 ++--
> >>> .../staging/media/sunxi/cedrus/cedrus_video.c | 14 ++++++
> >>> include/media/v4l2-mem2mem.h | 46 +++++++++++++++++++
> >>> include/media/videobuf2-core.h | 3 ++
> >>> include/media/videobuf2-v4l2.h | 5 ++
> >>> include/uapi/linux/videodev2.h | 14 ++++--
> >>> 15 files changed, 175 insertions(+), 11 deletions(-)
> >>
> >> I didn't want to make a v3 of this series, instead I hacked this patch
> >> that
> >> will hopefully do all the locking right.
> >>
> >> Basically I moved all the 'held' related code into v4l2-mem2mem under
> >> job_spinlock. This simplifies the driver code as well.
> >>
> >> But this is a hack that sits on top of this series. If your ffmpeg tests
> >> are now successful, then I'll turn this into a proper series with
> >> correct documentation (a lot of the comments are now wrong with this
> >> patch, so just ignore that).
> >
> > Thanks for looking into this! With small fix mentioned below, it works!
> > Note that both scenarios I tested (flushing during decoding and flushing
> > after decoding is finished) are focused on capture queue. In order to
> > trigger output queue flush, ffmpeg would need to queue multiple jobs and
> > call flush before they are all processed. This is not something I can do
> > at this time. Maybe Jonas can help with modifying ffmpeg appropriately.
> > However, code for case seems correct to me.
> >
> >> Regards,
> >>
> >> Hans
> >>
> >> diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c
> >> b/drivers/media/v4l2-core/v4l2-mem2mem.c index 2677a07e4c9b..f81a8f2465ab
> >> 100644
> >> --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
> >> +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
> >> @@ -412,25 +412,24 @@ static void v4l2_m2m_cancel_job(struct v4l2_m2m_ctx
> >> *m2m_ctx) }
> >>
> >> }
> >>
> >> -void v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev,
> >> - struct v4l2_m2m_ctx *m2m_ctx)
> >> +static bool _v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev,
> >> + struct v4l2_m2m_ctx *m2m_ctx)
> >>
> >> {
> >>
> >> - unsigned long flags;
> >> -
> >> - spin_lock_irqsave(&m2m_dev->job_spinlock, flags);
> >>
> >> if (!m2m_dev->curr_ctx || m2m_dev->curr_ctx != m2m_ctx) {
> >>
> >> - spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
> >>
> >> dprintk("Called by an instance not currently
> >
> > running\n");
> >
> >> - return;
> >> + return false;
> >>
> >> }
> >>
> >> list_del(&m2m_dev->curr_ctx->queue);
> >> m2m_dev->curr_ctx->job_flags &= ~(TRANS_QUEUED | TRANS_RUNNING);
> >> wake_up(&m2m_dev->curr_ctx->finished);
> >> m2m_dev->curr_ctx = NULL;
> >>
> >> + return true;
> >> +}
> >>
> >> - spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
> >> -
> >> +static void v4l2_m2m_job_next(struct v4l2_m2m_dev *m2m_dev,
> >> + struct v4l2_m2m_ctx *m2m_ctx)
> >> +{
> >>
> >> /* This instance might have more buffers ready, but since we do not
> >>
> >> * allow more than one job on the job_queue per instance, each has
> >> * to be scheduled separately after the previous one finishes. */
> >>
> >> @@ -441,8 +440,113 @@ void v4l2_m2m_job_finish(struct v4l2_m2m_dev
> >> *m2m_dev, */
> >>
> >> schedule_work(&m2m_dev->job_work);
> >>
> >> }
> >>
> >> +
> >> +void v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev,
> >> + struct v4l2_m2m_ctx *m2m_ctx)
> >> +{
> >> + unsigned long flags;
> >> +
> >> + spin_lock_irqsave(&m2m_dev->job_spinlock, flags);
> >> + if (!_v4l2_m2m_job_finish(m2m_dev, m2m_ctx)) {
> >> + spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
> >> + return;
> >> + }
> >> + spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
> >> +
> >> + v4l2_m2m_job_next(m2m_dev, m2m_ctx);
> >> +}
> >>
> >> EXPORT_SYMBOL(v4l2_m2m_job_finish);
> >>
> >> +void v4l2_m2m_job_finish_held(struct v4l2_m2m_dev *m2m_dev,
> >> + struct v4l2_m2m_ctx *m2m_ctx,
> >> + enum vb2_buffer_state state)
> >> +{
> >> + struct vb2_v4l2_buffer *src_buf, *dst_buf;
> >> + unsigned long flags;
> >> +
> >> + spin_lock_irqsave(&m2m_dev->job_spinlock, flags);
> >> + src_buf = v4l2_m2m_src_buf_remove(m2m_ctx);
> >> + dst_buf = v4l2_m2m_next_dst_buf(m2m_ctx);
> >> +
> >> + if (!src_buf || !dst_buf) {
> >> + pr_err("Missing source and/or destination buffers\n");
> >> + spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
> >> + return;
> >> + }
> >> + v4l2_m2m_buf_done(src_buf, state);
> >> + if (!dst_buf->is_held) {
> >> + v4l2_m2m_dst_buf_remove(m2m_ctx);
> >> + v4l2_m2m_buf_done(dst_buf, state);
> >> + }
> >> + if (!_v4l2_m2m_job_finish(m2m_dev, m2m_ctx)) {
> >> + spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
> >> + return;
> >> + }
> >> + spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
> >> +
> >> + v4l2_m2m_job_next(m2m_dev, m2m_ctx);
> >> +}
> >> +EXPORT_SYMBOL(v4l2_m2m_job_finish_held);
> >> +
> >> +/**
> >> + * v4l2_m2m_release_capture_buf() - check if the capture buffer should
> >> be
> >> + * released
> >> + *
> >> + * @out_vb: the output buffer
> >> + * @cap_vb: the capture buffer
> >> + *
> >> + * This helper function returns true if the current capture buffer
> >> should
> >> + * be released to vb2. This is the case if the output buffer specified
> >> that + * the capture buffer should be held (i.e. not returned to vb2)
> >> AND if the + * timestamp of the capture buffer differs from the output
> >> buffer timestamp. + *
> >> + * This helper is to be called at the start of the device_run callback:
> >> + *
> >> + * .. code-block:: c
> >> + *
> >> + * if (v4l2_m2m_release_capture_buf(out_vb, cap_vb)) {
> >> + * v4l2_m2m_dst_buf_remove(m2m_ctx);
> >> + * v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE);
> >> + * cap_vb = v4l2_m2m_next_dst_buf(m2m_ctx);
> >> + * }
> >> + * cap_vb->is_held = out_vb->flags &
> >
> > V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
> >
> >> + *
> >> + * ...
> >> + *
> >> + * v4l2_m2m_buf_done(out_vb, VB2_BUF_STATE_DONE);
> >> + * if (!cap_vb->is_held) {
> >> + * v4l2_m2m_dst_buf_remove(m2m_ctx);
> >> + * v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE);
> >> + * }
> >> + *
> >> + * This allows for multiple output buffers to be used to fill in a
> >> single
> >> + * capture buffer. This is typically used by stateless decoders where
> >> + * multiple e.g. H.264 slices contribute to a single decoded frame.
> >> + */
> >> +struct vb2_v4l2_buffer *v4l2_m2m_release_capture_buf(struct v4l2_m2m_ctx
> >> *m2m_ctx) +{
> >> + struct v4l2_m2m_dev *m2m_dev = m2m_ctx->m2m_dev;
> >> + struct vb2_v4l2_buffer *src, *dst;
> >> + unsigned long flags;
> >> +
> >> + spin_lock_irqsave(&m2m_dev->job_spinlock, flags);
> >> + src = v4l2_m2m_next_src_buf(m2m_ctx);
> >> + dst = v4l2_m2m_next_dst_buf(m2m_ctx);
> >> +
> >> + if (dst->is_held && dst->vb2_buf.copied_timestamp &&
> >> + src->vb2_buf.timestamp != dst->vb2_buf.timestamp) {
> >> + dst->is_held = false;
> >> + v4l2_m2m_dst_buf_remove(m2m_ctx);
> >> + v4l2_m2m_buf_done(dst, VB2_BUF_STATE_DONE);
> >> + dst = v4l2_m2m_next_dst_buf(m2m_ctx);
> >> + }
> >> + dst->is_held = src->flags & V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
> >> + src->flags &= ~V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
> >> + spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
> >> + return dst;
> >> +}
> >> +EXPORT_SYMBOL(v4l2_m2m_release_capture_buf);
> >> +
> >>
> >> int v4l2_m2m_reqbufs(struct file *file, struct v4l2_m2m_ctx *m2m_ctx,
> >>
> >> struct v4l2_requestbuffers *reqbufs)
> >>
> >> {
> >>
> >> @@ -1171,19 +1275,28 @@ int v4l2_m2m_ioctl_stateless_decoder_cmd(struct
> >> file *file, void *priv, {
> >>
> >> struct v4l2_fh *fh = file->private_data;
> >> struct vb2_v4l2_buffer *out_vb, *cap_vb;
> >>
> >> + struct v4l2_m2m_dev *m2m_dev = fh->m2m_ctx->m2m_dev;
> >> + unsigned long flags;
> >>
> >> int ret;
> >>
> >> ret = v4l2_m2m_ioctl_stateless_try_decoder_cmd(file, priv, dc);
> >> if (ret < 0)
> >>
> >> return ret;
> >>
> >> + spin_lock_irqsave(&m2m_dev->job_spinlock, flags);
> >>
> >> out_vb = v4l2_m2m_last_src_buf(fh->m2m_ctx);
> >> cap_vb = v4l2_m2m_last_dst_buf(fh->m2m_ctx);
> >>
> >> - if (out_vb)
> >> + if (out_vb && (out_vb->flags & V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF))
> >
> > {
> >
> >> out_vb->flags &= ~V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
> >>
> >> - else if (cap_vb && cap_vb->is_held)
> >> - v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE);
> >> + } else if (cap_vb && cap_vb->is_held) {
> >> + cap_vb->is_held = false;
> >> + if (m2m_dev->curr_ctx) {
> >
> > Above condition should be negated.
>
> Close. It should check that this buffer isn't currently being processed.
> So:
>
> if (m2m_dev->curr_ctx != fh->m2m_ctx) {
>
> Can you test with this change? If this works, then I'll post a proper
> series for this.

I confirm that it works.

Best regards,
Jernej

>
> Thanks!
>
> Hans
>
> > Best regards,
> > Jernej
> >
> >> + v4l2_m2m_dst_buf_remove(fh->m2m_ctx);
> >> + v4l2_m2m_buf_done(cap_vb,
> >
> > VB2_BUF_STATE_DONE);
> >
> >> + }
> >> + }
> >> + spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
> >>
> >> return 0;
> >>
> >> }
> >>
> >> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
> >> b/drivers/staging/media/sunxi/cedrus/cedrus_dec.c index
> >> 67f7d4326fc1..4e30f263b427 100644
> >> --- a/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
> >> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
> >> @@ -30,14 +30,7 @@ void cedrus_device_run(void *priv)
> >>
> >> struct media_request *src_req;
> >>
> >> run.src = v4l2_m2m_next_src_buf(ctx->fh.m2m_ctx);
> >>
> >> - run.dst = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
> >> -
> >> - if (v4l2_m2m_release_capture_buf(run.src, run.dst)) {
> >> - v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
> >> - v4l2_m2m_buf_done(run.dst, VB2_BUF_STATE_DONE);
> >> - run.dst = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
> >> - }
> >> - run.dst->is_held = run.src->flags &
> >
> > V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
> >
> >> + run.dst = v4l2_m2m_release_capture_buf(ctx->fh.m2m_ctx);
> >>
> >> run.first_slice = !run.dst->vb2_buf.copied_timestamp ||
> >>
> >> run.src->vb2_buf.timestamp != run.dst-
> >>
> >> vb2_buf.timestamp;
> >> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
> >> b/drivers/staging/media/sunxi/cedrus/cedrus_hw.c index
> >> 99fedec80224..242cad82cc8c 100644
> >> --- a/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
> >> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
> >> @@ -103,7 +103,6 @@ static irqreturn_t cedrus_irq(int irq, void *data)
> >>
> >> {
> >>
> >> struct cedrus_dev *dev = data;
> >> struct cedrus_ctx *ctx;
> >>
> >> - struct vb2_v4l2_buffer *src_buf, *dst_buf;
> >>
> >> enum vb2_buffer_state state;
> >> enum cedrus_irq_status status;
> >>
> >> @@ -121,26 +120,12 @@ static irqreturn_t cedrus_irq(int irq, void *data)
> >>
> >> dev->dec_ops[ctx->current_codec]->irq_disable(ctx);
> >> dev->dec_ops[ctx->current_codec]->irq_clear(ctx);
> >>
> >> - src_buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
> >> - dst_buf = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
> >> -
> >> - if (!src_buf || !dst_buf) {
> >> - v4l2_err(&dev->v4l2_dev,
> >> - "Missing source and/or destination
> >
> > buffers\n");
> >
> >> - return IRQ_HANDLED;
> >> - }
> >> -
> >>
> >> if (status == CEDRUS_IRQ_ERROR)
> >>
> >> state = VB2_BUF_STATE_ERROR;
> >>
> >> else
> >>
> >> state = VB2_BUF_STATE_DONE;
> >>
> >> - v4l2_m2m_buf_done(src_buf, state);
> >> - if (!dst_buf->is_held) {
> >> - v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
> >> - v4l2_m2m_buf_done(dst_buf, state);
> >> - }
> >> - v4l2_m2m_job_finish(ctx->dev->m2m_dev, ctx->fh.m2m_ctx);
> >> + v4l2_m2m_job_finish_held(ctx->dev->m2m_dev, ctx->fh.m2m_ctx,
state);
> >>
> >> return IRQ_HANDLED;
> >>
> >> }
> >>
> >> diff --git a/include/media/v4l2-mem2mem.h b/include/media/v4l2-mem2mem.h
> >> index 8ae2f56c7fa3..48ca7d3eaa3d 100644
> >> --- a/include/media/v4l2-mem2mem.h
> >> +++ b/include/media/v4l2-mem2mem.h
> >> @@ -173,6 +173,10 @@ void v4l2_m2m_try_schedule(struct v4l2_m2m_ctx
> >> *m2m_ctx); void v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev,
> >>
> >> struct v4l2_m2m_ctx *m2m_ctx);
> >>
> >> +void v4l2_m2m_job_finish_held(struct v4l2_m2m_dev *m2m_dev,
> >> + struct v4l2_m2m_ctx *m2m_ctx,
> >> + enum vb2_buffer_state state);
> >> +
> >>
> >> static inline void
> >> v4l2_m2m_buf_done(struct vb2_v4l2_buffer *buf, enum vb2_buffer_state
> >> state)
> >>
> >> {
> >> @@ -644,47 +648,7 @@ void v4l2_m2m_buf_copy_metadata(const struct
> >> vb2_v4l2_buffer *out_vb, struct vb2_v4l2_buffer *cap_vb,
> >>
> >> bool copy_frame_flags);
> >>
> >> -/**
> >> - * v4l2_m2m_release_capture_buf() - check if the capture buffer should
> >> be
> >> - * released
> >> - *
> >> - * @out_vb: the output buffer
> >> - * @cap_vb: the capture buffer
> >> - *
> >> - * This helper function returns true if the current capture buffer
> >> should
> >> - * be released to vb2. This is the case if the output buffer specified
> >> that - * the capture buffer should be held (i.e. not returned to vb2)
> >> AND if the - * timestamp of the capture buffer differs from the output
> >> buffer timestamp. - *
> >> - * This helper is to be called at the start of the device_run callback:
> >> - *
> >> - * .. code-block:: c
> >> - *
> >> - * if (v4l2_m2m_release_capture_buf(out_vb, cap_vb)) {
> >> - * v4l2_m2m_dst_buf_remove(m2m_ctx);
> >> - * v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE);
> >> - * cap_vb = v4l2_m2m_next_dst_buf(m2m_ctx);
> >> - * }
> >> - * cap_vb->is_held = out_vb->flags &
> >
> > V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
> >
> >> - *
> >> - * ...
> >> - *
> >> - * v4l2_m2m_buf_done(out_vb, VB2_BUF_STATE_DONE);
> >> - * if (!cap_vb->is_held) {
> >> - * v4l2_m2m_dst_buf_remove(m2m_ctx);
> >> - * v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE);
> >> - * }
> >> - *
> >> - * This allows for multiple output buffers to be used to fill in a
> >> single
> >> - * capture buffer. This is typically used by stateless decoders where
> >> - * multiple e.g. H.264 slices contribute to a single decoded frame.
> >> - */
> >> -static inline bool v4l2_m2m_release_capture_buf(const struct
> >> vb2_v4l2_buffer *out_vb, -
> >>
> > const struct vb2_v4l2_buffer *cap_vb)
> >
> >> -{
> >> - return cap_vb->is_held && cap_vb->vb2_buf.copied_timestamp &&
> >> - out_vb->vb2_buf.timestamp != cap_vb->vb2_buf.timestamp;
> >> -}
> >> +struct vb2_v4l2_buffer *v4l2_m2m_release_capture_buf(struct v4l2_m2m_ctx
> >> *m2m_ctx);
> >>
> >> /* v4l2 request helper */