Re: [PATCH] media: add GFP flag to media_*() that could get called in atomic context

From: Mauro Carvalho Chehab
Date: Tue Mar 15 2016 - 11:55:53 EST


Em Mon, 14 Mar 2016 14:09:09 +0200
Sakari Ailus <sakari.ailus@xxxxxx> escreveu:

> Hi Mauro,
>
> On Mon, Mar 14, 2016 at 08:46:33AM -0300, Mauro Carvalho Chehab wrote:
> > Em Mon, 14 Mar 2016 12:52:54 +0200
> > Sakari Ailus <sakari.ailus@xxxxxx> escreveu:
> >
> > > Hi Mauro,
> > >
> > > On Mon, Mar 14, 2016 at 07:13:58AM -0300, Mauro Carvalho Chehab wrote:
> > > > Em Mon, 14 Mar 2016 09:22:37 +0200
> > > > Sakari Ailus <sakari.ailus@xxxxxx> escreveu:
> > > >
> > > > > Hi Shuah,
> > > > >
> > > > > On Sat, Mar 12, 2016 at 06:48:09PM -0700, Shuah Khan wrote:
> > > > > > Add GFP flags to media_create_pad_link(), media_create_intf_link(),
> > > > > > media_devnode_create(), and media_add_link() that could get called
> > > > > > in atomic context to allow callers to pass in the right flags for
> > > > > > memory allocation.
> > > > > >
> > > > > > tree-wide driver changes for media_*() GFP flags change:
> > > > > > Change drivers to add gfpflags to interffaces, media_create_pad_link(),
> > > > > > media_create_intf_link() and media_devnode_create().
> > > > > >
> > > > > > Signed-off-by: Shuah Khan <shuahkh@xxxxxxxxxxxxxxx>
> > > > > > Suggested-by: Mauro Carvalho Chehab <mchehab@xxxxxxxxxxxxxxx>
> > > > >
> > > > > What's the use case for calling the above functions in an atomic context?
> > > > >
> > > >
> > > > ALSA code seems to do a lot of stuff at atomic context. That's what
> > > > happens on my test machine when au0828 gets probed before
> > > > snd-usb-audio:
> > > > http://pastebin.com/LEX5LD5K
> > > >
> > > > It seems that ALSA USB probe routine (usb_audio_probe) happens in
> > > > atomic context.
> > >
> > > usb_audio_probe() grabs a mutex (register_mutex) on its own. It certainly
> > > cannot be called in atomic context.
> > >
> > > In the above log, what I did notice, though, was that because *we* grab
> > > mdev->lock spinlock in media_device_register_entity(), we may not sleep
> > > which is what the notify() callback implementation in au0828 driver does
> > > (for memory allocation).
> >
> > True. After looking into the code, the problem is that the notify
> > callbacks are called with the spinlock hold. I don't see any reason
> > to do that.
>
> Notify callbacks, perhaps not, but the list is still protected by the
> spinlock. It perhaps is not likely that another process would change it but
> I don't think we can rely on that.

I can see only 2 risks protected by the lock:

1) mdev gets freed while an entity is being created. This is a problem
with the current memory protection schema we're using. I guess the
only way to fix it is to use kref for mdev/entities/interfaces/links/pads.
This change doesn't make it better or worse.
Also, I don't think we have such risk with the current devices.

2) a notifier may be inserted or removed by another driver, while the
loop is running.

To avoid (2), I see 3 alternatives:

a) keep the loop as proposed on this patch. As the list is navigated using
list_for_each_entry_safe(), I guess[1] it should be safe to remove/add
new notify callbacks there while the loop is running by some other process.

[1] It *is* safe if the change were done inside the loop - but I'm not
100% sure that it is safe if some other CPU touches the notify list.

b) Unlock/relock the spinlock every time:

/* previous code that locks mdev->lock spinlock */

/* invoke entity_notify callbacks */
list_for_each_entry_safe(notify, next, &mdev->entity_notify, list) {
spin_unlock(&mdev->lock);
(notify)->notify(entity, notify->notify_data);
spin_lock(&mdev->lock);
}

spin_unlock(&mdev->lock);

c) use a separate lock for the notify list -this seems to be an overkill.

d) Protect it with the graph traversal mutex. That sounds the worse idea,
IMHO, as we'll be abusing the lock.

>
> >
> > > Could we instead replace mdev->lock by a mutex?
> >
> > We changed the code to use a spinlock for a reason: this fixed some
> > troubles in the past with the code locking (can't remember the problem,
> > but this was documented at the kernel logs and at the ML). Yet, the code
> > under the spinlock never sleeps, so this is fine.
>
> struct media_device.lock was added by this patch:
>
> commit 53e269c102fbaf77e7dc526b1606ad4a48e57200
> Author: Laurent Pinchart <laurent.pinchart@xxxxxxxxxxxxxxxx>
> Date: Wed Dec 9 08:40:00 2009 -0300
>
> [media] media: Entities, pads and links
>
> As video hardware pipelines become increasingly complex and
> configurable, the current hardware description through v4l2 subdevices
> reaches its limits. In addition to enumerating and configuring
> subdevices, video camera drivers need a way to discover and modify at
> runtime how those subdevices are connected. This is done through new
> elements called entities, pads and links.
>
> ...
>
> Signed-off-by: Laurent Pinchart <laurent.pinchart@xxxxxxxxxxxxxxxx>
> Signed-off-by: Sakari Ailus <sakari.ailus@xxxxxx>
> Acked-by: Hans Verkuil <hverkuil@xxxxxxxxx>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@xxxxxxxxxx>
>
> I think it was always a spinlock, for the reason you stated above as well:
> it did not need to sleep. But if there is a need to sleep, I think we should
> consider changing that.

True, but there were some places that were using the graph_mutex
instead of the spinlock.

>
> >
> > Yet, in the future, we'll need to do a review of all the locking schema,
> > in order to better support dynamic graph changes.
>
> Agreed. I think more fine grained locking should be considered. The media
> graph mutex will become a bottleneck at some point, especially if we make
> the media devices system wide at some point.

Yes. I guess we should protect the memory allocated stuff with a kref,
and try to use RCS on most places, but we need more discussions and
more tests to implement a solution that would be reliable even if the
callers don't behave well.

> >
> > > If there is no pressing need to implement atomic memory allocation I simply
> > > wouldn't do it, especially in device initialisation where an allocation
> > > failure will lead to probe failure as well.
> >
> > The fix for this issue should be simple. See the enclosed code. Btw.
> > it probably makes sense to add some code here to avoid starving the
> > stack, as a notify callback could try to create an entity, with,
> > in turn, would call the notify callback again.
> >
> > I'll run some tests here to double check if it fixes the issue.
> >
> > ---
> >
> > [media] media-device: Don't call notify callbacks holding a spinlock
> >
> > The notify routines may sleep. So, they can't be called in spinlock
> > context. Also, they may want to call other media routines that might
> > be spinning the spinlock, like creating a link.
> >
> > This fixes the following bug:
> >
> > [ 1839.510587] BUG: sleeping function called from invalid context at mm/slub.c:1289
> > [ 1839.510881] in_atomic(): 1, irqs_disabled(): 0, pid: 3479, name: modprobe
> > [ 1839.511157] 4 locks held by modprobe/3479:
> > [ 1839.511415] #0: (&dev->mutex){......}, at: [<ffffffff81ce8933>] __driver_attach+0xa3/0x160
> > [ 1839.512381] #1: (&dev->mutex){......}, at: [<ffffffff81ce8941>] __driver_attach+0xb1/0x160
> > [ 1839.512388] #2: (register_mutex#5){+.+.+.}, at: [<ffffffffa10596c7>] usb_audio_probe+0x257/0x1c90 [snd_usb_audio]
> > [ 1839.512401] #3: (&(&mdev->lock)->rlock){+.+.+.}, at: [<ffffffffa0e6051b>] media_device_register_entity+0x1cb/0x700 [media]
> > [ 1839.512412] CPU: 2 PID: 3479 Comm: modprobe Not tainted 4.5.0-rc3+ #49
> > [ 1839.512415] Hardware name: /NUC5i7RYB, BIOS RYBDWi35.86A.0350.2015.0812.1722 08/12/2015
> > [ 1839.512417] 0000000000000000 ffff8803b3f6f288 ffffffff81933901 ffff8803c4bae000
> > [ 1839.512422] ffff8803c4bae5c8 ffff8803b3f6f2b0 ffffffff811c6af5 ffff8803c4bae000
> > [ 1839.512427] ffffffff8285d7f6 0000000000000509 ffff8803b3f6f2f0 ffffffff811c6ce5
> > [ 1839.512432] Call Trace:
> > [ 1839.512436] [<ffffffff81933901>] dump_stack+0x85/0xc4
> > [ 1839.512440] [<ffffffff811c6af5>] ___might_sleep+0x245/0x3a0
> > [ 1839.512443] [<ffffffff811c6ce5>] __might_sleep+0x95/0x1a0
> > [ 1839.512446] [<ffffffff8155aade>] kmem_cache_alloc_trace+0x20e/0x300
> > [ 1839.512451] [<ffffffffa0e66e3d>] ? media_add_link+0x4d/0x140 [media]
> > [ 1839.512455] [<ffffffffa0e66e3d>] media_add_link+0x4d/0x140 [media]
> > [ 1839.512459] [<ffffffffa0e69931>] media_create_pad_link+0xa1/0x600 [media]
> > [ 1839.512463] [<ffffffffa0fe11b3>] au0828_media_graph_notify+0x173/0x360 [au0828]
> > [ 1839.512467] [<ffffffffa0e68a6a>] ? media_gobj_create+0x1ba/0x480 [media]
> > [ 1839.512471] [<ffffffffa0e606fb>] media_device_register_entity+0x3ab/0x700 [media]
> >
> > (untested)
> >
> > Signed-off-by: Mauro Carvalho Chehab <mchehab@xxxxxxxxxxxxxxx>
> >
> > diff --git a/drivers/media/media-device.c b/drivers/media/media-device.c
> > index 6ba6e8f982fc..fc3c199e5500 100644
> > --- a/drivers/media/media-device.c
> > +++ b/drivers/media/media-device.c
> > @@ -587,14 +587,15 @@ int __must_check media_device_register_entity(struct media_device *mdev,
> > media_gobj_create(mdev, MEDIA_GRAPH_PAD,
> > &entity->pads[i].graph_obj);
> >
> > + spin_unlock(&mdev->lock);
> > +
> > /* invoke entity_notify callbacks */
> > list_for_each_entry_safe(notify, next, &mdev->entity_notify, list) {
> > (notify)->notify(entity, notify->notify_data);
> > }
> >
> > - spin_unlock(&mdev->lock);
> > -
> > mutex_lock(&mdev->graph_mutex);
> > +
> > if (mdev->entity_internal_idx_max
> > >= mdev->pm_count_walk.ent_enum.idx_max) {
> > struct media_entity_graph new = { .top = 0 };
> >
>


--
Thanks,
Mauro