CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.What I am thinking is who would use a basic RC algorithm in the kernel?
Hi,
[...]
In cable streaming notably, the RC job is to monitor the about of bits over aWhy we still care about GOP here. Hardware have no idea about GOP at
period of time (the window). This window is defined by the streaming hardware
buffering capabilities. Best at this point is to start reading through HRD
specifications, and open source rate control implementation (notably x264).
I think overall, we can live with adding hints were needed, and if the gop
information is appropriate hint, then we can just reuse the existing control.
all. Although in codec likes HEVC, IDR and intra pictures's nalu header
is different, there is not different in the hardware coding
configration. NALU header is generated by the userspace usually.
While future encoding would regard the current encoded picture as an IDR
is completed decided by the userspace.
The discussion was around having basic RC algorithm in the kernel driver,
possibly making use of hardware specific features without actually exposing itIt sounds like a fixed bitrate RC. Then this RC algorithm would in charge of selecting the reference frames?
all to userspace. So assuming we do that:
Paul's concern is that for best result, an RC algorithm could use knowledge of
keyframe placement to preserve bucket space (possibly using the last keyframe
size as a hint). Exposing the GOP structure in some form allow "prediction", so
the adaption can lookahead future budget without introducing latency. There is
an alternative, which is to require ahead of time queuing of encode requests.
But this does introduce latency since the way it works in V4L2 today, we needI don't think it would help. Fence is a thing for DRM/GPU without a queue.
the picture to be filled by the time we request an encode.
Though, if we drop the GOP structure and favour this approach, the latency could
be regain later by introducing fence base streaming. The technique would be for
a video source (like a capture driver) to pass dmabuf that aren't filled yet,
but have a companion fence. This would allow queuing requests ahead of time, and
all we need is enough pre-allocation to accommodate the desired look ahead. Only
issue is that perhaps this violates the fundamental of "short term" delivery of
fences. But fences can also fail I think, in case the capture was stopped.
We can certainly move forward with this as a future solution, or just don'tI think we should not restrict how the userspace(vendor) operate the hardware.
implement future aware RC algorithm in term to avoid the huge task this involves
(and possibly patents?)
[...]H.264, H.265 has the byte_alignment() in nalu. You don't need skip bits feature which could be found in H1.
Of course, the subject is much more relevant when there is encoders with more
then 1 reference. But you are correct, what the commands do, is allow to change,
add or remove any reference from the list (random modification), as long as they
fit in the codec contraints (like the DPB size notably). This is the only way
one can implement temporal SVC reference pattern, robust reference trees or RTP
RPSI. Note that long term reference also exists, and are less complex then these
commands.
If we the userspace could manage the lifetime of reconstruction
buffers(assignment, reference), we don't need a command here.
Sorry if I created confusion, the comments was something specific to H.264
coding. Its a compressed form for the reference lists. This information is coded
in the slice header and enabled through adaptive_ref_pic_marking_mode_flag
It was suggested so far to leave h264 slice headers writing to the driver. This
is motivated by H264 slice header not being byte aligned in size, so the
slice_data() is hard to combine. Also, some hardware actually produce theI don't even think we should write the slice header into the CAPTURE buffer, which would cause a cache problem. Ususally the slice header would be written only when that slice data is copy out.
slice_header. This needs actual hardware interface analyses, cause an H.264
slice header is worth nothing if it cannot instruct the decoder how to maintain
the desired reference state.
I think this aspect should probably not be generalized to all CODECs, since the
packing semantic can largely differ. When the codec header is indeed byte
aligned, it can easily be seperate and combined by application, improve the
application flexibility, reducing the kernel API complexity.
It is just a problem of how to design another request API control
structure to select which buffers would be used for list0, list1.
I this raises a big question, and I never checked how this worked with let's say
VA. Shall we let the driver resolve the changes into commands (VP8 have
something similar, while VP9 and AV1 are refresh flags, which are just trivial
to compute). I believe I'll have to investigate this further.
[...]
regards,
Nicolas