Re: [PATCH 2/2] [media] media-device: split media initialization and registration

From: Javier Martinez Canillas
Date: Fri Sep 11 2015 - 03:33:40 EST


Hello Sakari,

On 09/11/2015 07:51 AM, Sakari Ailus wrote:
> Hi Javier,
>
> Javier Martinez Canillas wrote:
>> Hello Sakari,
>>
>> On 09/10/2015 07:14 PM, Sakari Ailus wrote:
>>> Hi Javier,
>>>
>>> Thanks for the set! A few comments below.
>>>
>>
>> Thanks to you for your feedback.
>>
>>> Javier Martinez Canillas wrote:
>>>> The media device node is registered and so made visible to user-space
>>>> before entities are registered and links created which means that the
>>>> media graph obtained by user-space could be only partially enumerated
>>>> if that happens too early before all the graph has been created.
>>>>
>>>> To avoid this race condition, split the media init and registration
>>>> in separate functions and only register the media device node when
>>>> all the pending subdevices have been registered, either explicitly
>>>> by the driver or asynchronously using v4l2_async_register_subdev().
>>>>
>>>> Also, add a media_entity_cleanup() function that will destroy the
>>>> graph_mutex that is initialized in media_entity_init().
>>>>
>>>> Suggested-by: Sakari Ailus <sakari.ailus@xxxxxxxxxxxxxxx>
>>>> Signed-off-by: Javier Martinez Canillas <javier@xxxxxxxxxxxxxxx>
>>>>
>>>> ---
>>>>
>>>> drivers/media/common/siano/smsdvb-main.c | 1 +
>>>> drivers/media/media-device.c | 38 +++++++++++++++++++++++----
>>>> drivers/media/platform/exynos4-is/media-dev.c | 12 ++++++---
>>>> drivers/media/platform/omap3isp/isp.c | 11 +++++---
>>>> drivers/media/platform/s3c-camif/camif-core.c | 13 ++++++---
>>>> drivers/media/platform/vsp1/vsp1_drv.c | 19 ++++++++++----
>>>> drivers/media/platform/xilinx/xilinx-vipp.c | 11 +++++---
>>>> drivers/media/usb/au0828/au0828-core.c | 26 +++++++++++++-----
>>>> drivers/media/usb/cx231xx/cx231xx-cards.c | 22 +++++++++++-----
>>>> drivers/media/usb/dvb-usb-v2/dvb_usb_core.c | 11 +++++---
>>>> drivers/media/usb/dvb-usb/dvb-usb-dvb.c | 13 ++++++---
>>>> drivers/media/usb/siano/smsusb.c | 14 ++++++++--
>>>> drivers/media/usb/uvc/uvc_driver.c | 9 +++++--
>>>> include/media/media-device.h | 2 ++
>>>> 14 files changed, 156 insertions(+), 46 deletions(-)
>>>>
>>>> diff --git a/drivers/media/common/siano/smsdvb-main.c b/drivers/media/common/siano/smsdvb-main.c
>>>> index ab345490a43a..8a1ea2192439 100644
>>>> --- a/drivers/media/common/siano/smsdvb-main.c
>>>> +++ b/drivers/media/common/siano/smsdvb-main.c
>>>> @@ -617,6 +617,7 @@ static void smsdvb_media_device_unregister(struct smsdvb_client_t *client)
>>>> if (!coredev->media_dev)
>>>> return;
>>>> media_device_unregister(coredev->media_dev);
>>>> + media_device_cleanup(coredev->media_dev);
>>>> kfree(coredev->media_dev);
>>>> coredev->media_dev = NULL;
>>>> #endif
>>>> diff --git a/drivers/media/media-device.c b/drivers/media/media-device.c
>>>> index 745defb34b33..a8beb0b445a6 100644
>>>> --- a/drivers/media/media-device.c
>>>> +++ b/drivers/media/media-device.c
>>>> @@ -526,7 +526,7 @@ static void media_device_release(struct media_devnode *mdev)
>>>> }
>>>>
>>>> /**
>>>> - * media_device_register - register a media device
>>>> + * media_device_init() - initialize a media device
>>>> * @mdev: The media device
>>>> *
>>>> * The caller is responsible for initializing the media device before
>>>> @@ -534,12 +534,11 @@ static void media_device_release(struct media_devnode *mdev)
>>>> *
>>>> * - dev must point to the parent device
>>>> * - model must be filled with the device model name
>>>> + *
>>>> + * returns zero on success or a negative error code.
>>>> */
>>>> -int __must_check __media_device_register(struct media_device *mdev,
>>>> - struct module *owner)
>>>> +int __must_check media_device_init(struct media_device *mdev)
>>>
>>> I think I suggested making media_device_init() return void as the only
>>> remaining source of errors would be driver bugs.
>>>
>>
>> Yes you did and I think I explained why I preferred to leave it as
>> is and I thought we agreed :)
>
> I thought we agreed, too. But my understanding was that the agreement was different. ;-)
>

Fair enough :)

>>
>> Currently media_device_register() is failing gracefully if a buggy
>> driver is not setting mdev->dev. So changing media_device_init() to
>> return void instead, would be a semantic change and if drivers are
>> not checking that anymore, can lead to NULL pointer dereference bugs.
>
> Do we have such drivers? Would they ever have worked in the first place, as media device registration would have failed?
>

Most likely we don't but since I'm changing all the drivers anyways, I'll
take a look and change to void and propose a fix if I find something but
it seems is just that the function is checking a condition that would not
happen with the in-tree media drivers.

I'll change to void and remove the return value check in drivers for v2.

>>
>>> I'd simply replace the WARN_ON() below with BUG().
>>>
>>
>> Sorry but I disagree, I think that BUG() should only be used for
>> exceptional cases in which execution can't really continue without
>> causing data corruption or something like that, so bringing down
>> the machine is the safest and least bad option.
>
> I think it's also fine to use that for basic sanity checks on code paths that will be run early and every time.
>

I still think that causing a panic when it could had be avoided is
pretty harsh. And is not only me, even Linus thinks the same:

https://lkml.org/lkml/2015/6/24/662

> To support what I'm saying, just do this:
>
> $ grep BUG_ON drivers/media/v4l2-core/*
>
> Even though most of that is in videobuf, V4L2 core does that, too, and there's a case of especially delicious usage in v4l2_subdev_init(). :-)
>

Yes, but I do wonder how many of those are really necessary and if
most need to be converted to WARN_ON() and return an error instead...

>>
>> But that's not the case here, if there is a buggy driver then the
>> worst thing that would happen is that a driver probe function is
>> going to fail. It is true that drivers may not be cheking this but
>> that's why is annotated with __must_check.
>
> I still maintain it makes no sense to build error handling in drivers if the only source of error would be bad parameters specified by a driver to a function.
>
> The driver could also pass a dangling pointer to the function as well, and you couldn't find out.

You convinced me, then I think we should remove the check altogether
(not keeping it and changing it to BUG_ON) do you agree with that?

>
>>
>>>> {
>>>> - int ret;
>>>> -
>>>> if (WARN_ON(mdev->dev == NULL || mdev->model[0] == 0))
>>>> return -EINVAL;
>>>>
>>
>> We can later audit all drivers and change this function to return
>> void instead and get rid of this check but I would prefer to do it
>> as a followup patch.
>
> As this patch essentially moves media_device_register() elsewhere and puts media_device_init() in its place, the driver error handling is mostly unaffected in that very location.
>
> I'm fine with the approach, although I don't think we'll need much auditing; such drivers would never have functioned in the first place.
>

Yes, but the difference is that by checking and returning an error, such drivers
that were not working (probably there even isn't such a driver) would had
failed to probe but by removing the check, it could now cause a NULL pointer
dereference. That's why we need to audit to be sure that there isn't a buggy one.

>>
>>>> @@ -550,6 +549,35 @@ int __must_check __media_device_register(struct media_device *mdev,
>>>> spin_lock_init(&mdev->lock);
>>>> mutex_init(&mdev->graph_mutex);
>>>>
>>>> + dev_dbg(mdev->dev, "Media device initialized\n");
>>>> +
>>>> + return 0;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(media_device_init);
>>>> +
>>>> +/**
>>>> + * media_device_cleanup() - Cleanup a media device
>>>> + * @mdev: The media device
>>>> + *
>>>> + */
>>>> +void media_device_cleanup(struct media_device *mdev)
>>>> +{
>>>> + mutex_destroy(&mdev->graph_mutex);
>>>
>>> Very nice!
>>>
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(media_device_cleanup);
>>>> +
>>>> +/**
>>>> + * __media_device_register() - register a media device
>>>> + * @mdev: The media device
>>>> + * @owner: The module owner
>>>> + *
>>>> + * returns zero on success or a negative error code.
>>>> + */
>>>> +int __must_check __media_device_register(struct media_device *mdev,
>>>> + struct module *owner)
>>>> +{
>>>> + int ret;
>>>> +
>>>> /* Register the device node. */
>>>> mdev->devnode.fops = &media_device_fops;
>>>> mdev->devnode.parent = mdev->dev;
>>>> diff --git a/drivers/media/platform/exynos4-is/media-dev.c b/drivers/media/platform/exynos4-is/media-dev.c
>>>> index 4a25df9dd869..158738bd23fc 100644
>>>> --- a/drivers/media/platform/exynos4-is/media-dev.c
>>>> +++ b/drivers/media/platform/exynos4-is/media-dev.c
>>>> @@ -1313,7 +1313,10 @@ static int subdev_notifier_complete(struct v4l2_async_notifier *notifier)
>>>> ret = v4l2_device_register_subdev_nodes(&fmd->v4l2_dev);
>>>> unlock:
>>>> mutex_unlock(&fmd->media_dev.graph_mutex);
>>>> - return ret;
>>>> + if (ret < 0)
>>>> + return ret;
>>>> +
>>>> + return media_device_register(&fmd->media_dev);
>>>
>>> Uh-oh.
>>>
>>> I guess it's fair to simply fail here. probe() is already over so I
>>> wonder if there's anything we could really do except await someone
>>> unloading and loading the module again? Or, replugging the hardware.
>>>
>>
>> Yes, in fact I noticed the same. The problem is that drivers that
>> register a notifier with v4l2_async_notifier_register(), don't know if
>> a complete callback (or bound callback FWIW) failed. Only the subdev
>> driver knows that something failed when it tried to register a subdev
>> asynchronously with v4l2_async_register_subdev().
>>
>>> I don't like the idea but I guess this could be solved later on.
>>>
>>
>> Yes me neither but since v4l2_device_register_subdev_nodes() can also
>> fail and the return value was returned to v4l2_async_test_notify(), I
>> decided to do the same when adding the call to media_device_register().
>>
>> But I agree that this should be fixed somehow, either making all drivers
>> do cleanup if either the bound or complete notifier callbacks fail or
>> the v4l2-async core should call a cleanup function that is registered by
>> the drivers.
>>
>> But as you said I also think this could be solved later on since it's a
>> separate issue IMHO.
>
> Agreed.
>
>>>> }
>>>>
>>>> static int fimc_md_probe(struct platform_device *pdev)
>>>> @@ -1350,9 +1353,9 @@ static int fimc_md_probe(struct platform_device *pdev)
>>>> return ret;
>>>> }
>>>>
>>>> - ret = media_device_register(&fmd->media_dev);
>>>> + ret = media_device_init(&fmd->media_dev);
>>>> if (ret < 0) {
>>>> - v4l2_err(v4l2_dev, "Failed to register media device: %d\n", ret);
>>>> + v4l2_err(v4l2_dev, "Failed to init media device: %d\n", ret);
>>>> goto err_v4l2_dev;
>>>> }
>>>>
>>>> @@ -1424,7 +1427,7 @@ err_clk:
>>>> err_m_ent:
>>>> fimc_md_unregister_entities(fmd);
>>>> err_md:
>>>> - media_device_unregister(&fmd->media_dev);
>>>> + media_device_cleanup(&fmd->media_dev);
>>>> err_v4l2_dev:
>>>> v4l2_device_unregister(&fmd->v4l2_dev);
>>>> return ret;
>>>> @@ -1445,6 +1448,7 @@ static int fimc_md_remove(struct platform_device *pdev)
>>>> fimc_md_unregister_entities(fmd);
>>>> fimc_md_pipelines_free(fmd);
>>>> media_device_unregister(&fmd->media_dev);
>>>> + media_device_cleanup(&fmd->media_dev);
>>>> fimc_md_put_clocks(fmd);
>>>>
>>>> return 0;
>>>> diff --git a/drivers/media/platform/omap3isp/isp.c b/drivers/media/platform/omap3isp/isp.c
>>>> index cb8ac90086c1..b7eda3043c31 100644
>>>> --- a/drivers/media/platform/omap3isp/isp.c
>>>> +++ b/drivers/media/platform/omap3isp/isp.c
>>>> @@ -1793,6 +1793,7 @@ static void isp_unregister_entities(struct isp_device *isp)
>>>>
>>>> v4l2_device_unregister(&isp->v4l2_dev);
>>>> media_device_unregister(&isp->media_dev);
>>>> + media_device_cleanup(&isp->media_dev);
>>>> }
>>>>
>>>> static int isp_link_entity(
>>>> @@ -1875,9 +1876,9 @@ static int isp_register_entities(struct isp_device *isp)
>>>> sizeof(isp->media_dev.model));
>>>> isp->media_dev.hw_revision = isp->revision;
>>>> isp->media_dev.link_notify = isp_pipeline_link_notify;
>>>> - ret = media_device_register(&isp->media_dev);
>>>> + ret = media_device_init(&isp->media_dev);
>>>> if (ret < 0) {
>>>> - dev_err(isp->dev, "%s: Media device registration failed (%d)\n",
>>>> + dev_err(isp->dev, "%s: Media device init failed (%d)\n",
>>>> __func__, ret);
>>>> return ret;
>>>> }
>>>> @@ -2347,7 +2348,11 @@ static int isp_subdev_notifier_complete(struct v4l2_async_notifier *async)
>>>> }
>>>> }
>>>>
>>>> - return v4l2_device_register_subdev_nodes(&isp->v4l2_dev);
>>>> + ret = v4l2_device_register_subdev_nodes(&isp->v4l2_dev);
>>>> + if (ret < 0)
>>>> + return ret;
>>>> +
>>>> + return media_device_register(&isp->media_dev);
>>>> }
>>>>
>>>> /*
>>>> diff --git a/drivers/media/platform/s3c-camif/camif-core.c b/drivers/media/platform/s3c-camif/camif-core.c
>>>> index 3e33c60be004..1428db2f804d 100644
>>>> --- a/drivers/media/platform/s3c-camif/camif-core.c
>>>> +++ b/drivers/media/platform/s3c-camif/camif-core.c
>>>> @@ -305,7 +305,7 @@ static void camif_unregister_media_entities(struct camif_dev *camif)
>>>> /*
>>>> * Media device
>>>> */
>>>> -static int camif_media_dev_register(struct camif_dev *camif)
>>>> +static int camif_media_dev_init(struct camif_dev *camif)
>>>> {
>>>> struct media_device *md = &camif->media_dev;
>>>> struct v4l2_device *v4l2_dev = &camif->v4l2_dev;
>>>> @@ -328,7 +328,7 @@ static int camif_media_dev_register(struct camif_dev *camif)
>>>> if (ret < 0)
>>>> return ret;
>>>>
>>>> - ret = media_device_register(md);
>>>> + ret = media_device_init(md);
>>>> if (ret < 0)
>>>> v4l2_device_unregister(v4l2_dev);
>>>>
>>>> @@ -483,7 +483,7 @@ static int s3c_camif_probe(struct platform_device *pdev)
>>>> goto err_alloc;
>>>> }
>>>>
>>>> - ret = camif_media_dev_register(camif);
>>>> + ret = camif_media_dev_init(camif);
>>>> if (ret < 0)
>>>> goto err_mdev;
>>>>
>>>> @@ -510,6 +510,11 @@ static int s3c_camif_probe(struct platform_device *pdev)
>>>> goto err_unlock;
>>>>
>>>> mutex_unlock(&camif->media_dev.graph_mutex);
>>>> +
>>>> + ret = media_device_register(&camif->media_dev);
>>>> + if (ret < 0)
>>>> + goto err_sens;
>>>> +
>>>> pm_runtime_put(dev);
>>>> return 0;
>>>>
>>>> @@ -518,6 +523,7 @@ err_unlock:
>>>> err_sens:
>>>> v4l2_device_unregister(&camif->v4l2_dev);
>>>> media_device_unregister(&camif->media_dev);
>>>> + media_device_cleanup(&camif->media_dev);
>>>> camif_unregister_media_entities(camif);
>>>> err_mdev:
>>>> vb2_dma_contig_cleanup_ctx(camif->alloc_ctx);
>>>> @@ -539,6 +545,7 @@ static int s3c_camif_remove(struct platform_device *pdev)
>>>> struct s3c_camif_plat_data *pdata = &camif->pdata;
>>>>
>>>> media_device_unregister(&camif->media_dev);
>>>> + media_device_cleanup(&camif->media_dev);
>>>> camif_unregister_media_entities(camif);
>>>> v4l2_device_unregister(&camif->v4l2_dev);
>>>>
>>>> diff --git a/drivers/media/platform/vsp1/vsp1_drv.c b/drivers/media/platform/vsp1/vsp1_drv.c
>>>> index 8f995d267646..bcbc24e55bf5 100644
>>>> --- a/drivers/media/platform/vsp1/vsp1_drv.c
>>>> +++ b/drivers/media/platform/vsp1/vsp1_drv.c
>>>> @@ -127,6 +127,7 @@ static void vsp1_destroy_entities(struct vsp1_device *vsp1)
>>>>
>>>> v4l2_device_unregister(&vsp1->v4l2_dev);
>>>> media_device_unregister(&vsp1->media_dev);
>>>> + media_device_cleanup(&vsp1->media_dev);
>>>> }
>>>>
>>>> static int vsp1_create_entities(struct vsp1_device *vsp1)
>>>> @@ -141,9 +142,9 @@ static int vsp1_create_entities(struct vsp1_device *vsp1)
>>>> strlcpy(mdev->model, "VSP1", sizeof(mdev->model));
>>>> snprintf(mdev->bus_info, sizeof(mdev->bus_info), "platform:%s",
>>>> dev_name(mdev->dev));
>>>> - ret = media_device_register(mdev);
>>>> + ret = media_device_init(mdev);
>>>> if (ret < 0) {
>>>> - dev_err(vsp1->dev, "media device registration failed (%d)\n",
>>>> + dev_err(vsp1->dev, "media device init failed (%d)\n",
>>>> ret);
>>>> return ret;
>>>> }
>>>> @@ -288,11 +289,19 @@ static int vsp1_create_entities(struct vsp1_device *vsp1)
>>>> }
>>>>
>>>> ret = v4l2_device_register_subdev_nodes(&vsp1->v4l2_dev);
>>>> -
>>>> -done:
>>>> if (ret < 0)
>>>> - vsp1_destroy_entities(vsp1);
>>>> + goto done;
>>>>
>>>> + ret = media_device_register(mdev);
>>>
>>> Are there cases where media_device_register() won't complain aloud
>>> already? I wouldn't print an error message in a driver anymore.
>>>
>>
>> Good question, there are already error messages printed if the media dev
>> node fails to be registered or sysfs attribute file fails to be created.
>>
>> What I did is to follow the convention used on each driver and added an
>> error message for both media_device_init() and media_device_register()
>> if the driver already had one for the old media_device_register() and
>> didn't if the driver was not doing it.
>>
>>>> + if (ret < 0) {
>>>> + dev_err(vsp1->dev, "media device init failed (%d)\n", ret);
>>>> + goto done;
>>>> + }
>>>> +
>>>> + return 0;
>>>> +
>>>> +done:
>>>> + vsp1_destroy_entities(vsp1);
>>>> return ret;
>>>> }
>>>>
>>>> diff --git a/drivers/media/platform/xilinx/xilinx-vipp.c b/drivers/media/platform/xilinx/xilinx-vipp.c
>>>> index 79d4be7ce9a5..5bb18aee0707 100644
>>>> --- a/drivers/media/platform/xilinx/xilinx-vipp.c
>>>> +++ b/drivers/media/platform/xilinx/xilinx-vipp.c
>>>> @@ -311,6 +311,10 @@ static int xvip_graph_notify_complete(struct v4l2_async_notifier *notifier)
>>>> if (ret < 0)
>>>> dev_err(xdev->dev, "failed to register subdev nodes\n");
>>>>
>>>> + ret = media_device_register(&xdev->media_dev);
>>>
>>> Same here (and elsewhere).
>>>
>>
>> I don't have a strong opinion, if you think that is better to remove
>> the error messages, I can do it when re-spining the patches.
>
> I'd be in favour of removing them. If the framework already complains about it, there's no reason to do the same in drivers.
>

Ok, I'll remove the error messages from the drivers then on v2.

Thanks a lot for your feedback!

Best regards,
--
Javier Martinez Canillas
Open Source Group
Samsung Research America
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/