Re: [PATCH v2 2/4] media: uvcvideo: Do not set an async control owned by other fh

From: Hans Verkuil
Date: Mon Dec 02 2024 - 05:55:33 EST


On 02/12/2024 11:26, Hans de Goede wrote:
> Hi,
>
> On 2-Dec-24 9:44 AM, Hans Verkuil wrote:
>> On 02/12/2024 09:11, Laurent Pinchart wrote:
>>> On Mon, Dec 02, 2024 at 09:05:07AM +0100, Hans Verkuil wrote:
>>>> On 02/12/2024 01:18, Laurent Pinchart wrote:
>>>>> On Fri, Nov 29, 2024 at 11:18:54PM +0100, Ricardo Ribalda wrote:
>>>>>> On Fri, 29 Nov 2024 at 23:03, Laurent Pinchart wrote:
>>>>>>> On Fri, Nov 29, 2024 at 07:47:31PM +0100, Ricardo Ribalda wrote:
>>>>>>>> Before we all go on a well deserved weekend, let me recap what we
>>>>>>>> know. If I did not get something correctly, let me know.
>>>>>>>>
>>>>>>>> 1) Well behaved devices do not allow to set or get an incomplete async
>>>>>>>> control. They will stall instead (ref: Figure 2-21 in UVC 1.5 )
>>>>>>>> 2) Both Laurent and Ricardo consider that there is a big chance that
>>>>>>>> some camera modules do not implement this properly. (ref: years of
>>>>>>>> crying over broken module firmware :) )
>>>>>>>>
>>>>>>>> 3) ctrl->handle is designed to point to the fh that originated the
>>>>>>>> control. So the logic can decide if the originator needs to be
>>>>>>>> notified or not. (ref: uvc_ctrl_send_event() )
>>>>>>>> 4) Right now we replace the originator in ctrl->handle for unfinished
>>>>>>>> async controls. (ref:
>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_ctrl.c#n2050)
>>>>>>>>
>>>>>>>> My interpretation is that:
>>>>>>>> A) We need to change 4). We shall not change the originator of
>>>>>>>> unfinished ctrl->handle.
>>>>>>>> B) Well behaved cameras do not need the patch "Do not set an async
>>>>>>>> control owned by another fh"
>>>>>>>> C) For badly behaved cameras, it is fine if we slightly break the
>>>>>>>> v4l2-compliance in corner cases, if we do not break any internal data
>>>>>>>> structure.
>>>>>>>
>>>>>>> The fact that some devices may not implement the documented behaviour
>>>>>>> correctly may not be a problem. Well-behaved devices will stall, which
>>>>>>> means we shouldn't query the device while as async update is in
>>>>>>> progress. Badly-behaved devices, whatever they do when queried, should
>>>>>>> not cause any issue if we don't query them.
>>>>>>
>>>>>> I thought we could detect the stall and return safely. Isn't that the case?
>>>>>
>>>>> We could, but if we know the device will stall anyway, is there a reason
>>>>> not to avoid issuing the request in the first place ?
>>>>>
>>>>>> Why we have not seen issues with this?
>>>>>
>>>>> I haven't tested a PTZ device for a very long time, and you would need
>>>>> to hit a small time window to see the issue.
>>>>>
>>>>>>> We should not send GET_CUR and SET_CUR requests to the device while an
>>>>>>> async update is in progress, and use cached values instead. When we
>>>>>>> receive the async update event, we should clear the cache. This will be
>>>>>>> the same for both well-behaved and badly-behaved devices, so we can
>>>>>>> expose the same behaviour towards userspace.
>>>>>>
>>>>>> seting ctrl->loaded = 0 when we get an event sounds like a good idea
>>>>>> and something we can implement right away.
>>>>>> If I have to resend the set I will add it to the end.
>>>>>>
>>>>>>> We possibly also need some kind of timeout mechanism to cope with the
>>>>>>> async update event not being delivered by the device.
>>>>>>
>>>>>> This is the part that worries me the most:
>>>>>> - timeouts make the code fragile
>>>>>> - What is a good value for timeout? 1 second, 30, 300? I do not think
>>>>>> that we can find a value.
>>>>>
>>>>> I've been thinking about the implementation of uvc_fh cleanup over the
>>>>> weekend, and having a timeout would have the nice advantage that we
>>>>> could reference-count uvc_fh instead of implementing a cleanup that
>>>>> walks over all controls when closing a file handle. I think it would
>>>>> make the code simpler, and possibly safer too.
>>>>>
>>>>>>> Regarding the userspace behaviour during an auto-update, we have
>>>>>>> multiple options:
>>>>>>>
>>>>>>> For control get,
>>>>>>>
>>>>>>> - We can return -EBUSY
>>>>>>> - We can return the old value from the cache
>>>>
>>>> This would match the control behavior best. Only when the operation is
>>>> done is the control updated and the control event sent.
>>>>
>>>> Some questions: is any of this documented for UVC? Because this is non-standard
>>>
>>> No this isn't documented.
>>>
>>>> behavior. Are there applications that rely on this? Should we perhaps add
>>>
>>> I don't know.
>>>
>>>> proper support for this to the control framework? E.g. add an ASYNC flag and
>>>> document this?
>>>
>>> We could, but this is such a specific use case that I don't think is
>>> worth adding complexity to the already complex control framework would
>>> be worth it. What we could do is perhaps adding a flag for the userspace
>>> API, but even there, I never like modelling an API with a single user.
>>
>> Well, it might be a single driver that uses this, but it is also the most
>> used driver by far. I think the only change is to add a flag for this and
>> describe how it should behave. And add v4l2-compliance tests for it.
>>
>> Otherwise no changes to the control framework are needed, I think.
>>
>> Controls with the ASYNC flag set would:
>>
>> - return the old value from the cache.
>> - document that setting a new value while the operation is in progress
>> results in EBUSY. Document that if the new value is equal to the old value,
>> then return 0 and do nothing (alternative is to just immediately send
>> the control changed event, but that might require a control framework change).
>> - when the operation finishes, update the cache to the new value and
>> send the control changed event.
>> - document that userspace should specify V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK
>> when subscribing to the control if you calling fh wants to know when
>> the operation finishes.
>> - document how timeouts should be handled: this is tricky, especially with
>> bad hardware. I.e. if the hw doesn't send the event, does that mean that
>> you are never able to set the control since it will stall?
>> In the end this will just reflect how UVC handles this.
>
> I have been catching up on this thread (I have not read the v3 and v4
> threads yet).
>
> This all started with Ricardo noticing that ctrl->handle may get
> overwritten when another app sets the ctrl, causing the first app
> to set the ctrl to get a V4L2_EVENT for the ctrl (if subscribed)
> even though it set the ctrl itself.
>
> My observations so far:
>
> 1. This is only hit when another app changes the ctrl after the first app,
> in this case, if there is no stall issued by the hw for the second app's
> request, arguably the first app getting the event for the ctrl is correct
> since it was changed by the second app. IOW I think the current behavior
> is not only fine, but even desirable. Assuming we only override ctrl->handle
> after successfully sending the set-ctrl request to the hardware.
>
> 2. This adds a lot of complexity for not sending an event to the app
> which made the change. Hans V. suggested maybe adding some sort of flag
> for async ctrls to the userspace API. I wonder if we should not just
> get rid of this complexity and document that these controls will always
> generate events independent of V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK ?
> That would certainly simplify things, but it raises the questions if
> this will cause issues for existing applications.

I'm not that keen on this. That's why a new flag can come in handy since
if present, then that indicates that it makes sense to specify
V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK when subscribing to the control events.

This ensures that uvc follows the current v4l2 spec. It's also why I
prefer that g_ctrl will just return the old value until the new value
has been reached: that way the control event corresponds with the actual
updating of the control value.

That said, it's just my opinion and I am OK with UVC doing things a bit
differently. Just be aware that I have no objection to adding an ASYNC flag,
given how widely UVC is used.

Regards,

Hans

>
> Note that if we simply return -EBUSY on set until acked by a status
> event we also avoid the issue of ctrl->handle getting overwritten,
> but that relies on reliable status events; or requires timeout handling.
>
> 3. I agree with Ricardo that a timeout based approach for cameras which
> to not properly send status events for async ctrls is going to be
> problematic. Things like pan/tilt homing can take multiple seconds which
> is really long to use as a timeout if we plan to return -EBUSY until
> the timeout triggers. I think it would be better to just rely on
> the hardware sending a stall, or it accepting and correctly handling
> a new CUR_SET command while the previous one is still being processed.
>
> I guess we can track if the hw does send status events when async ctrls
> complete and then do the -EBUSY thing without going out to the hw after
> the first time an async ctrl has been acked by a status event.
>
> And then combine that with the current behavior of overwriting ctrl->handle
> until the ctrl has been marked as having working status events. So:
>
> a) In case we do not know yet if a ctrl gets status-event acks; and
> on devices without reliable status events keep current behavior.
>
> b) As soon as we know a ctrl has reliable status events, switch to
> returning -EBUSY if a set is pending (as indicated by ctrl->handle
> being set).
>
> I don't like the fact that this changes the behavior after the first
> status event acking an async ctrl, but I don't really see another way.
>
> Regards,
>
> Hans
>
>
>
>
>>>
>>>>>>> - We can return the new value fromt he cache
>>>>>>>
>>>>>>> Returning -EBUSY would be simpler to implement.
>>>>>>
>>>>>> Not only easy, I think it is the most correct,
>>>>>>
>>>>>>> I don't think the behaviour should depend on whether the control is read
>>>>>>> on the file handle that initiated the async operation or on a different
>>>>>>> file handle.
>>>>>>>
>>>>>>> For control set, I don't think we can do much else than returning
>>>>>>> -EBUSY, regardless of which file handle the control is set on.
>>>>>>
>>>>>> ACK.
>>>>>>
>>>>>>>> I will send a new version with my interpretation.
>>>>>>>>
>>>>>>>> Thanks for a great discussion
>>>>>>
>>>>>> Looking with some perspective... I believe that we should look into
>>>>>> the "userspace behaviour for auto controls" in a different patchset.
>>>>>> It is slightly unrelated to this discussion.
>>>
>>
>>
>
>