Re: [PATCH v2 2/4] media: uvcvideo: Do not set an async control owned by other fh

From: Hans Verkuil
Date: Mon Dec 02 2024 - 03:44:23 EST


On 02/12/2024 09:11, Laurent Pinchart wrote:
> On Mon, Dec 02, 2024 at 09:05:07AM +0100, Hans Verkuil wrote:
>> On 02/12/2024 01:18, Laurent Pinchart wrote:
>>> On Fri, Nov 29, 2024 at 11:18:54PM +0100, Ricardo Ribalda wrote:
>>>> On Fri, 29 Nov 2024 at 23:03, Laurent Pinchart wrote:
>>>>> On Fri, Nov 29, 2024 at 07:47:31PM +0100, Ricardo Ribalda wrote:
>>>>>> Before we all go on a well deserved weekend, let me recap what we
>>>>>> know. If I did not get something correctly, let me know.
>>>>>>
>>>>>> 1) Well behaved devices do not allow to set or get an incomplete async
>>>>>> control. They will stall instead (ref: Figure 2-21 in UVC 1.5 )
>>>>>> 2) Both Laurent and Ricardo consider that there is a big chance that
>>>>>> some camera modules do not implement this properly. (ref: years of
>>>>>> crying over broken module firmware :) )
>>>>>>
>>>>>> 3) ctrl->handle is designed to point to the fh that originated the
>>>>>> control. So the logic can decide if the originator needs to be
>>>>>> notified or not. (ref: uvc_ctrl_send_event() )
>>>>>> 4) Right now we replace the originator in ctrl->handle for unfinished
>>>>>> async controls. (ref:
>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_ctrl.c#n2050)
>>>>>>
>>>>>> My interpretation is that:
>>>>>> A) We need to change 4). We shall not change the originator of
>>>>>> unfinished ctrl->handle.
>>>>>> B) Well behaved cameras do not need the patch "Do not set an async
>>>>>> control owned by another fh"
>>>>>> C) For badly behaved cameras, it is fine if we slightly break the
>>>>>> v4l2-compliance in corner cases, if we do not break any internal data
>>>>>> structure.
>>>>>
>>>>> The fact that some devices may not implement the documented behaviour
>>>>> correctly may not be a problem. Well-behaved devices will stall, which
>>>>> means we shouldn't query the device while as async update is in
>>>>> progress. Badly-behaved devices, whatever they do when queried, should
>>>>> not cause any issue if we don't query them.
>>>>
>>>> I thought we could detect the stall and return safely. Isn't that the case?
>>>
>>> We could, but if we know the device will stall anyway, is there a reason
>>> not to avoid issuing the request in the first place ?
>>>
>>>> Why we have not seen issues with this?
>>>
>>> I haven't tested a PTZ device for a very long time, and you would need
>>> to hit a small time window to see the issue.
>>>
>>>>> We should not send GET_CUR and SET_CUR requests to the device while an
>>>>> async update is in progress, and use cached values instead. When we
>>>>> receive the async update event, we should clear the cache. This will be
>>>>> the same for both well-behaved and badly-behaved devices, so we can
>>>>> expose the same behaviour towards userspace.
>>>>
>>>> seting ctrl->loaded = 0 when we get an event sounds like a good idea
>>>> and something we can implement right away.
>>>> If I have to resend the set I will add it to the end.
>>>>
>>>>> We possibly also need some kind of timeout mechanism to cope with the
>>>>> async update event not being delivered by the device.
>>>>
>>>> This is the part that worries me the most:
>>>> - timeouts make the code fragile
>>>> - What is a good value for timeout? 1 second, 30, 300? I do not think
>>>> that we can find a value.
>>>
>>> I've been thinking about the implementation of uvc_fh cleanup over the
>>> weekend, and having a timeout would have the nice advantage that we
>>> could reference-count uvc_fh instead of implementing a cleanup that
>>> walks over all controls when closing a file handle. I think it would
>>> make the code simpler, and possibly safer too.
>>>
>>>>> Regarding the userspace behaviour during an auto-update, we have
>>>>> multiple options:
>>>>>
>>>>> For control get,
>>>>>
>>>>> - We can return -EBUSY
>>>>> - We can return the old value from the cache
>>
>> This would match the control behavior best. Only when the operation is
>> done is the control updated and the control event sent.
>>
>> Some questions: is any of this documented for UVC? Because this is non-standard
>
> No this isn't documented.
>
>> behavior. Are there applications that rely on this? Should we perhaps add
>
> I don't know.
>
>> proper support for this to the control framework? E.g. add an ASYNC flag and
>> document this?
>
> We could, but this is such a specific use case that I don't think is
> worth adding complexity to the already complex control framework would
> be worth it. What we could do is perhaps adding a flag for the userspace
> API, but even there, I never like modelling an API with a single user.

Well, it might be a single driver that uses this, but it is also the most
used driver by far. I think the only change is to add a flag for this and
describe how it should behave. And add v4l2-compliance tests for it.

Otherwise no changes to the control framework are needed, I think.

Controls with the ASYNC flag set would:

- return the old value from the cache.
- document that setting a new value while the operation is in progress
results in EBUSY. Document that if the new value is equal to the old value,
then return 0 and do nothing (alternative is to just immediately send
the control changed event, but that might require a control framework change).
- when the operation finishes, update the cache to the new value and
send the control changed event.
- document that userspace should specify V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK
when subscribing to the control if you calling fh wants to know when
the operation finishes.
- document how timeouts should be handled: this is tricky, especially with
bad hardware. I.e. if the hw doesn't send the event, does that mean that
you are never able to set the control since it will stall?
In the end this will just reflect how UVC handles this.

Regards,

Hans

>
>>>>> - We can return the new value fromt he cache
>>>>>
>>>>> Returning -EBUSY would be simpler to implement.
>>>>
>>>> Not only easy, I think it is the most correct,
>>>>
>>>>> I don't think the behaviour should depend on whether the control is read
>>>>> on the file handle that initiated the async operation or on a different
>>>>> file handle.
>>>>>
>>>>> For control set, I don't think we can do much else than returning
>>>>> -EBUSY, regardless of which file handle the control is set on.
>>>>
>>>> ACK.
>>>>
>>>>>> I will send a new version with my interpretation.
>>>>>>
>>>>>> Thanks for a great discussion
>>>>
>>>> Looking with some perspective... I believe that we should look into
>>>> the "userspace behaviour for auto controls" in a different patchset.
>>>> It is slightly unrelated to this discussion.
>