Re: [PATCH v6 3/3] sound/usb: Use Media Controller API to share media resources

From: Shuah Khan
Date: Wed Dec 07 2016 - 15:04:26 EST


Hi Sakari,

On 12/07/2016 03:52 AM, Sakari Ailus wrote:
> Hi Shuah,
>
> On Mon, Dec 05, 2016 at 05:38:23PM -0700, Shuah Khan wrote:
>> On 12/05/2016 04:21 PM, Laurent Pinchart wrote:
>>> Hi Shuah,
>>>
>>> On Monday 05 Dec 2016 15:44:30 Shuah Khan wrote:
>>>> On 11/30/2016 03:01 PM, Shuah Khan wrote:
>>>>> Change ALSA driver to use Media Controller API to share media resources
>>>>> with DVB, and V4L2 drivers on a AU0828 media device.
>>>>>
>>>>> Media Controller specific initialization is done after sound card is
>>>>> registered. ALSA creates Media interface and entity function graph
>>>>> nodes for Control, Mixer, PCM Playback, and PCM Capture devices.
>>>>>
>>>>> snd_usb_hw_params() will call Media Controller enable source handler
>>>>> interface to request the media resource. If resource request is granted,
>>>>> it will release it from snd_usb_hw_free(). If resource is busy, -EBUSY is
>>>>> returned.
>>>>>
>>>>> Media specific cleanup is done in usb_audio_disconnect().
>>>>>
>>>>> Signed-off-by: Shuah Khan <shuahkh@xxxxxxxxxxxxxxx>
>>>>
>>>> Hi Takashi,
>>>>
>>>> If you are good with this patch, could you please Ack it, so Mauro
>>>> can pull it into media tree with the other two patches in this series,
>>>> when he is ready to do so.
>>>
>>> I *really* want to address the concerns raised by Sakari before pulling more
>>> code that makes fixing the race conditions more difficult. Please, let's all
>>> work on fixing the core code to build a stable base on which we can build
>>> additional features. V4L2 and MC need teamwork, it's time to give the
>>> subsystem the love it deserves.
>>>
>>
>> Hi Laurent,
>>
>> The issue Sakari brought up is specific to using devm for video_device in
>> omap3 and vsp1. I tried reproducing the problem on two different drivers
>> and couldn't on Linux 4.9-rc7.
>>
>> After sharing that with Sakari, I suggested to Sakari to pull up his patch
>> that removes the devm usage and see if he still needs all the patches in his
>> patch series. He didn't back to me on that. I also requested him to rebase on
>
> Just to see what remains, I made a small hack to test this with omap3isp by
> just replacing the devm_() functions by their plain counterparts. The memory
> is thus never released, for there is no really a proper moment to release it
> --- something which the patchset resolves. The result is here:
>
> <URL:http://www.retiisi.org.uk/v4l2/tmp/media-ref-dmesg.txt>

Did you test this on 4.9-rc7 without any of your other patches? If you
haven't could you please run this test with just the removing devm usage
from omap3isp?

It would be good to get a baseline on the current with just the not using
devm first and then see what needs fixing.

Also, could you please send me the complete dmesg.

thanks,
-- Shuah

>
> What happens there is that as part of the device unbinding, the graph
> objects are unregistered (by media_device_unregister()) while streaming is
> ongoing. Their parent media device pointers are set to NULL.
>
> Then, when the user process exists, the streaming media pipeline is stopped.
> This requires parsing of the media graph, a job which will obviously fail
> miserably and immediately: the media device is obtained from the entity
> where graph traversal is expected to begin.
>
> Two mutexes are also acquired but they've been already destroyed at that
> point (and the memory of which would also be released now without the hack
> to test this "without devm"). There's just a single warning on this though.
>
>> top of media dev allocator because the allocator routines he has don't address
>> the shared media device need.
>
> I do strongly prefer fixing the existing object lifetime issues in the
> framework before extending it and thus making the problem domain more
> complex than it is already.
>
> What I mentioned as an example of this are the other callbacks to the media
> device: presumably they do suffer from the exactly same problems as the
> enable_source() and disable_source() ones.
>
> As drivers do refer to other entities and controls they do expose also to
> the user space, we can't really even remove entities safely at the moment.
> We should have a solution to this as well, my patchset at the moment does
> not address this.
>
> What we do need a sound basis for the framework rather than hastily written
> improvements to support a particular device. Also, testing device removal
> with a particular device in a particular use case does not guarantee that no
> further object lifetime issues related to device removal exist, even with
> that same driver, let alone other devices.
>
> Instead, we must consider how the frameworks and drivers manage the memory
> allocated for the various objects, what are relations of those objects and
> how they're exposed to the user space either directly or indirectly. As long
> as objects (such as the media graph objects) with different lifetimes are
> referred to from other objects without taking a reference or alternatively
> serialising code paths that may access those objects and those that remove
> them, we do have a problem.
>
> The media graph with all the subsystem specific device nodes that are
> exported to the user space is a rather complex data structure. I don't think
> that acquiring the media graph lock in order to fix all the serialisation
> problems is the right approach here.
>
>>
>> He also didn't respond to my response regarding the reasons for choosing
>> graph_mutex to protect enable_source and disable_source handlers.
>
> I did. The point is that this is a partial fix and that does not properly
> address the problem of device removal. It probably does decrease of the
> probability of hitting the bug but it does not fix it.
>
>>
>> So I am not sure how to move forward at the moment without a concrete plan
>> for Sakari's RFC series. Sakari's patch series is still RFC and doesn't address
>> shared media_device and requires all drivers to change.
>
> I'm all for extending the functionality of the Media controller framework,
> but existing known problems touching the same parts of the framework should
> be fixed first.
>
> Reviewing patches, pointing out problems and proposing improvements would
> certainly help to achieve that goal.
>