Re: [RFC PATCH 0/1] Add LTR controls

From: Hans Verkuil
Date: Wed Jul 22 2020 - 10:26:45 EST


On 20/07/2020 16:33, dikshita@xxxxxxxxxxxxxx wrote:
> On 2020-07-16 14:25, Hans Verkuil wrote:
>> On 16/06/2020 20:41, dikshita@xxxxxxxxxxxxxx wrote:
>>> Hi Hans, Nicolas,
>>>
>>> Thanks for your comments.
>>>
>>> On 2020-06-12 14:41, Hans Verkuil wrote:
>>>> Hi Dikshita, Nicolas,
>>>>
>>>> On 11/06/2020 16:22, Nicolas Dufresne wrote:
>>>>> Le jeudi 11 juin 2020 Ã 15:55 +0530, Dikshita Agarwal a Ãcrit :
>>>>>> LTR (Long Term Reference) frames are the frames that are encoded
>>>>>> sometime in the past
>>>>>> and stored in the DPB buffer list to be used as reference to encode
>>>>>> future frames.
>>>>>> One usage of LTR encoding is to reduce error propagation for video
>>>>>> transmission
>>>>>> in packet lossy networks. For example, encoder may want to specify
>>>>>> some key frames as
>>>>>> LTR pictures and use them as reference frames for encoding. With
>>>>>> extra protection
>>>>>> selectively on these LTR frames or synchronization with the receiver
>>>>>> of reception of
>>>>>> the LTR frames during transmission, decoder can receive reference
>>>>>> frames more reliably
>>>>>> than other non-reference frames. As a result, transmission error can
>>>>>> be effectively
>>>>>> restricted within certain frames rather than propagated to future
>>>>>> frames.
>>>>>>
>>>>>> We are introducing below V4l2 Controls for this feature
>>>>>> 1. V4L2_CID_MPEG_VIDEO_LTRCOUNT
>>>>>> ÂÂÂ a. This is used to query or configure the number of LTR frames.
>>>>>> ÂÂÂÂÂÂ This is a static control and is controlled by the client.
>>>>>> ÂÂÂ b. The LTR index varies from 0 to the max LTR-1.
>>>>>> ÂÂÂ c. If LTR Count is more than max supported LTR count (max LTR) by
>>>>>> driver, it will be rejected.
>>>>>> ÂÂÂ d. Auto Marking : If LTR count is non zero,
>>>>>> ÂÂÂÂÂÂÂ 1) first LTR count frames would be mark as LTR automatically
>>>>>> after
>>>>>> ÂÂÂÂÂÂÂÂÂÂÂÂ every IDR frame (inclusive).
>>>>>> ÂÂÂÂÂÂÂ 2) For multilayer encoding: first LTR count base layer
>>>>>> reference frames starting after
>>>>>> ÂÂÂÂÂÂÂÂÂÂ every IDR frame (inclusive) in encoding order would be
>>>>>> marked as LTR frames by the encoder.
>>>>>> ÂÂÂÂÂÂÂ 3) Auto marking of LTR due to IDR should consider following
>>>>>> conditions:
>>>>>> ÂÂÂÂÂÂÂÂÂÂÂ 1. The frame is not already set to be marked as LTR.
>>>>>> ÂÂÂÂÂÂÂÂÂÂÂ 2. The frame is part of the base layer in the
>>>>>> hierarchical layer case.
>>>>>> ÂÂÂÂÂÂÂÂÂÂÂ 3. The number of frames currently marked as LTR is less
>>>>>> than the maximum LTR frame index plus 1.
>>>>>> ÂÂÂ e. Encoder needs to handle explicit Mark/Use command when encoder
>>>>>> is still doing "auto" marking
>>>>
>>>> I don't follow this, quite possibly due to lack of experience with
>>>> encoders.
>>>>
>>>> I kind of would expect to see two modes: either automatic where
>>>> encoders can
>>>> mark up to LTR_COUNT frames as long term reference, and userspace just
>>>> sets
>>>> LTR_COUNT and doesn't have to do anything else.
>>>>
>>>> Or it is manual mode where userspace explicitly marks long term
>>>> reference
>>>> frames.
>>>>
>>>> From the proposal above it looks like you can mix auto and manual
>>>> modes.
>>>>
>>>> BTW, how do you 'unmark' long term reference frames?
>>>>
>>>> This feature is for stateful encoders, right?
>>>>
>>>>>
>>>>> Perhaps we are missing a LONG_TERM_REFERENCE_MODE ? I bet some encoder
>>>>> can select by themself long term references and even some encoders may
>>>>> not let the user decide.
>>>>>
>>>>> (not huge han of LTR acronyme, but that could be fine too, assuming
>>>>> you
>>>>> add more _).
>>>>>
>>>
>>> Userspace sets LTR count which signifies the number of LTR frames
>>> encoder needs to generate or keep.
>>> The encoder has to build-up its internal buffer reference list (aka DBP
>>> list or recon buffer list).
>>> In order to achieve that encoder will fill It's LTR (long term
>>> references) list and STR (short term references) list
>>> by auto marking n frames as LTR frames(n is equal to LTR count) based on
>>> auto-marking dictated by the encoder spec.
>>> The client then can replace those automatically marked frames with new
>>> frames using V4L2_CID_MPEG_VIDEO_MARKLTRFRAME and can ask
>>> encoder to refer the newly marked frame for encoding the next frame
>>> using V4L2_CID_MPEG_VIDEO_USELTRFRAME.
>>>
>>>>>>
>>>>>> 2. V4L2_CID_MPEG_VIDEO_MARKLTRFRAME :
>>>>>> ÂÂÂ a. This signals to mark the current frame as LTR frame. It is a
>>>>>> dynamic control and also provide the LTR index to be used.
>>>>>> ÂÂÂ b. the LTR index provided by this control should never exceed the
>>>>>> max LTR-1. Else it will be rejected.
>>>>>
>>>>> The "current" frame seems a bit loose. Perhaps you wanted to use
>>>>> buffer
>>>>> flags ? A bit like what we have to signal TOP/BOTTOM fields in
>>>>> alternate interlacing.
>>>>
>>>> I was thinking the same thing. Using a control for this doesn't seem
>>>> right.
>>>>
>>>
>>> the client sets this to replace automatically marked frames by the
>>> encoder with a particular frame.
>>> this provides an index that ranges from 0 to LTR count-1 and then the
>>> particular frame will be marked with that index.
>>> this can be achieved through request by associating this control with a
>>> specific buffer to make it synchronized.
>>>
>>>>>
>>>>>>
>>>>>> 3. V4L2_CID_MPEG_VIDEO_USELTRFRAME :
>>>>>> ÂÂÂ a. This specifies the LTR frame(s) to be used for encoding the
>>>>>> current frame. This is a dynamic control.
>>>>>> ÂÂÂ b. LTR Use Bitmap : this consists of bits [0, 15]. A total of N
>>>>>> LSB bits of this field are valid,
>>>>>> ÂÂÂÂÂÂ where N is the maximum number of LTRs supported. All the other
>>>>>> bits are invalid and should be rejected.
>>>>>> ÂÂÂÂÂÂ The LSB corresponds to the LTR index 0. Bit N-1 from the LSB
>>>>>> corresponds to the LTR index max LTR-1.
>>>>
>>>> How would userspace know this? Esp. with auto marking since userspace
>>>> would have
>>>> to predict how auto marking works (if I understand this correctly).
>>>>
>>>
>>> Client sets LTR count which tells about the number of LTR frames
>>> automatically marked by the encoder.
>>> so client can use LTR index (0 to LTR count -1) to ask encoder to refer
>>> any particular
>>> frame (marked automatically by driver or marked by client with
>>> V4L2_CID_MPEG_VIDEO_MARKLTRFRAME) as a reference to encode the next
>>> frame.
>>>
>>>> For which HW encoder is this meant?
>>>>
>>> This is primarily meant for H264 and HEVC.
>>
>> The venus encoder?
> yes
>>
>> Some more questions:
>>
>> 1) How many LTR frames do h.264 and hevc allow?
> According to spec, MAX LTR allowed by H.264 is 15 and HEVC allows max 32 LTR frames.
>> 2) Given N LTR frames, is there a ordering of those frames? E.g.
>> ÂÂ the LTR frame with index 0 is processed/used differently from
>> ÂÂ LTR frame with index 1? Or are they all equal in that it is just a pool
>> ÂÂ of LTR frames that the encoder can use as it wishes?
> they are different frames marked to be used as LTR and stored in available indices.
> Userspace notifies encoder which LTR frame index to use via USE LTR control.

One more question: if a frame is marked as a LTR frame, does that mean that userspace
can't reuse the buffer containing that frame? I assume that's the case.

One thing that I don't like about this API is that it introduces the LTR indices.
Would it be possible to instead use buffer indices? (i.e. the v4l2_buffer index field).
Then you can use a buffer flag to indicate that a buffer should be an LTR buffer,
or (for automarking) the driver can mark a buffer as an LTR buffer by setting this flag
before returning the buffer to userspace.

You would still need a V4L2_CID_MPEG_VIDEO_USELTRFRAME, but that can be a bitmask
using the buffer index.

This would avoid introducing a second method of referring to buffers (LTR index),
and I think that V4L2_CID_MPEG_VIDEO_MARKLTRFRAME can be dropped.

Regards,

Hans

>
> Thanks,
> Dikshita
>>
>> Regards,
>>
>> ÂÂÂÂHans
>>
>>>
>>> Thanks,
>>> Dikshita
>>>
>>>>>
>>>>> Note, I haven't captured very well the userspace control flow, perhaps
>>>>> this could be enhanced through writing some documentation.
>>>>>
>>>>> As per all other generic encoder controls, we need to make sure it
>>>>> will
>>>>> be usable and flexible enough for multiple HW blocks, as it can be
>>>>> tedious to extend later otherwise. It is important that along with
>>>>> this
>>>>> RFC you provide some comparisons with with other HW / SW APIs in order
>>>>> to help justify the design decisions. I also think there should be
>>>>> link made V4L2_CID_MPEG_VIDEO_GOP_* , number of B-Frames etc.
>>>>
>>>> I agree with Nicolas.
>>>>
>>>> Regards,
>>>>
>>>> ÂÂÂÂHans
>>>>
>>>>>
>>>>> regards,
>>>>> Nicolas
>>>>>
>>>>>>
>>>>>> Dikshita Agarwal (1):
>>>>>> Â media: v4l2-ctrls:Â add control for ltr
>>>>>>
>>>>>> Âdrivers/media/v4l2-core/v4l2-ctrls.c | 6 ++++++
>>>>>> Âinclude/uapi/linux/v4l2-controls.hÂÂ | 4 ++++
>>>>>> Â2 files changed, 10 insertions(+)
>>>>>>
>>>>>