RE: [PATCH 1/1] remoteproc: fix recovery procedure

From: Loic PALLARDY
Date: Thu Jan 17 2019 - 16:44:26 EST




> -----Original Message-----
> From: linux-remoteproc-owner@xxxxxxxxxxxxxxx <linux-remoteproc-
> owner@xxxxxxxxxxxxxxx> On Behalf Of Loic PALLARDY
> Sent: jeudi 17 janvier 2019 21:52
> To: Bjorn Andersson <bjorn.andersson@xxxxxxxxxx>
> Cc: xiang xiao <xiaoxiang781216@xxxxxxxxx>; ohad@xxxxxxxxxx; linux-
> remoteproc@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Arnaud
> POULIQUEN <arnaud.pouliquen@xxxxxx>; benjamin.gaignard@xxxxxxxxxx; s-
> anna@xxxxxx
> Subject: RE: [PATCH 1/1] remoteproc: fix recovery procedure
>
> Hi Bjorn,
>
> > -----Original Message-----
> > From: Bjorn Andersson <bjorn.andersson@xxxxxxxxxx>
> > Sent: jeudi 17 janvier 2019 19:00
> > To: Loic PALLARDY <loic.pallardy@xxxxxx>
> > Cc: xiang xiao <xiaoxiang781216@xxxxxxxxx>; ohad@xxxxxxxxxx; linux-
> > remoteproc@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Arnaud
> > POULIQUEN <arnaud.pouliquen@xxxxxx>; benjamin.gaignard@xxxxxxxxxx;
> s-
> > anna@xxxxxx
> > Subject: Re: [PATCH 1/1] remoteproc: fix recovery procedure
> >
> > On Mon 14 Jan 12:23 PST 2019, Loic PALLARDY wrote:
> >
> > > Hi Xiang,
> > >
> > > > -----Original Message-----
> > > > From: xiang xiao <xiaoxiang781216@xxxxxxxxx>
> > > > Sent: samedi 12 janvier 2019 19:29
> > > > To: Loic PALLARDY <loic.pallardy@xxxxxx>
> > > > Cc: bjorn.andersson@xxxxxxxxxx; ohad@xxxxxxxxxx; linux-
> > > > remoteproc@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Arnaud
> > > > POULIQUEN <arnaud.pouliquen@xxxxxx>;
> > benjamin.gaignard@xxxxxxxxxx; s-
> > > > anna@xxxxxx
> > > > Subject: Re: [PATCH 1/1] remoteproc: fix recovery procedure
> > > >
> >
> > Thanks Loic for picking this up again.
> >
> > > > Hi Loic,
> > > > The change just hide the problem, I think. The big issue is:
> > > > 1.virtio devices aren't destroyed by rpproc_stop
> > > Virtio devices are destroyed by rproc_stop() as vdev is registered as rproc
> > sub device.
> > > rproc_stop() is calling rproc_stop_subdevices() which is in charge of
> > removing virtio device and associated children.
> > > rproc_vdev_do_stop() --> rproc_remove_virtio_dev() -->
> > unregister_virtio_device()
> > >
> >
> > Xiang is right, unregister_virtio_device() ends up decrementing the
> > refcount of device and might free it, but it's not guaranteed.
>
> But it that case calling rproc_shutdown() doesn't guarantee devices are free,
> it is the same.
> The only difference will be that rproc_vdev will be released by rproc and
> then reallocated. So virtio device allocation is restarting with a virgin memory
> buffer. But you will have some ghost devices and restart may failed too.
> I post a fix [1] last summer to be sure virtio device won't be released while
> still registered or registering... So there is still potential issue.
>
> >
> > So I think we need to decouple the rproc_vdev and virtio_device, to
> > allow the latter to potentially outlive the prior.
> >
> I checked how to decouple at least the allocation because one issue here.
> The main issue is that all references are done based on container_of().
> I look for a fix having the less impacts on the current code, but still possible to
> create cross pointer references between rproc_vdev and virtio device.
> It will clean up the memory allocation procedure, but the problem is still
> there if sub virtio devices not well release.
> We need to not be able to restart remote processor if at least one sub device
> was not correctly release...
>
> > > Please find below trace of a recovery on my ST SOC. My 2 rpmsg tty are
> > removed and re-inserted correctly
> > > root@stm32mp1:~# ls /dev/ttyRPMSG*
> > > /dev/ttyRPMSG0 /dev/ttyRPMSG1
> > > root@stm32mp1:~# [ 154.832523] remoteproc remoteproc0: crash
> > detected in m4: type watchdog
> > > [ 154.837725] remoteproc remoteproc0: handling crash #2 in m4
> > > [ 154.843319] remoteproc remoteproc0: recovering m4
> > > [ 154.849185] rpmsg_tty virtio0.rpmsg-tty-channel.-1.0: rpmsg tty device
> 0
> > is removed
> > > [ 154.857572] rpmsg_tty virtio0.rpmsg-tty-channel.-1.1: rpmsg tty device
> 1
> > is removed
> > > [ 155.382327] remoteproc remoteproc0: warning: remote FW shutdown
> > without ack
> > > [ 155.387857] remoteproc remoteproc0: stopped remote processor m4
> > > [ 155.398988] m4@0#vdev0buffer: assigned reserved memory node
> > vdev0buffer@10044000
> > > [ 155.405910] virtio_rpmsg_bus virtio0: creating channel rpmsg-tty-
> channel
> > addr 0x0
> > > [ 155.413422] rpmsg_tty virtio0.rpmsg-tty-channel.-1.0: new channel:
> > 0x400 -> 0x0 : ttyRPMSG0
> > > [ 155.421038] virtio_rpmsg_bus virtio0: creating channel rpmsg-tty-
> channel
> > addr 0x1
> > > [ 155.429088] rpmsg_tty virtio0.rpmsg-tty-channel.-1.1: new channel:
> > 0x401 -> 0x1 : ttyRPMSG1
> > > [ 155.437338] virtio_rpmsg_bus virtio0: rpmsg host is online
> > > [ 155.442401] m4@0#vdev0buffer: registered virtio0 (type 7)
> > > [ 155.461154] remoteproc remoteproc0: remote processor m4 is now up
> > > ls /dev/ttyRPMSG*
> > > /dev/ttyRPMSG0 /dev/ttyRPMSG1
> > > root@stm32mp1:~#
> > >
> > > As vdev is including in a larger struct allocated by rproc, it is safe
> > > to set it to 0 before initializing virtio device while rproc subdevice
> > > sequence is respected.
> > >
> >
> > It's likely that this works in most use cases, but if for some reason
> > there's additional references held those will operate on the object past
> > your clearing of it.
>
> In fact, as the memory is free/kzalloc, virtio device fields are not all at 0 as
> during boot sequence.
> As mentioned below issue is coming from kobject state_initialized field which
> is not in a correct state.
> This field is only set by kobject_init().
> I think normal way of working is to release memory when a device is no more
> used.
> But another solution could be to reset it in kobject_cleanup() or
> kobject_del() in order to have a symmetrical procedure.

Reading some literature, it is a bad idea.
Having a look to device_initialize () function description, it is clearly mention device struct must be 0 (except fields provided by user) before. (Same in kobject documentation)

Extract drivers/base/core.c [1]
* All fields in @dev must be initialized by the caller to 0, except
* for those explicitly set to some other value. The simplest
* approach is to use kzalloc() to allocate the structure containing
* @dev.

So memset or kfree/kzalloc of virtio_device manadatory.

Regards,
Loic

[1]: https://elixir.bootlin.com/linux/latest/source/drivers/base/core.c#L1482
>
> Regards,
> Loic
> [1] https://patchwork.kernel.org/patch/10544757/
>
> >
> > Regards,
> > Bjorn
> >
> > > > 2.and then rpmsg child devices aren't destroyed too
> > > > Then, when the remote start and create rpmsg channel again, the
> > > > duplicated channel will appear in kernel.
> > > > To fix this problem, we need go through rpproc_shutdown/rproc_boot
> to
> > > > destroy all devices(virtio and rpmsg) and create them again.
> > > Rproc_shutdown/rproc_boot is solving the issue too, except if
> > rproc_boot() was called several times and so rproc->power atomic not
> equal
> > to 1.
> > > Using only rproc_stop() and rproc_start() allows to preserve rproc-
> >power
> > and so to be silent from rproc user pov.
> > >
> > > Regards,
> > > Loic
> > > >
> > > > Thanks
> > > > Xiang
> > > >
> > > > On Wed, Jan 9, 2019 at 6:56 PM Loic Pallardy <loic.pallardy@xxxxxx>
> > wrote:
> > > > >
> > > > > Commit 7e83cab824a87e83cab824a8 ("remoteproc: Modify recovery
> > path
> > > > > to use rproc_{start,stop}()") replaces rproc_{shutdown,boot}() with
> > > > > rproc_{stop,start}(), which skips destroy the virtio device at stop
> > > > > but re-initializes it again at start.
> > > > >
> > > > > Issue is that struct virtio_dev is not correctly reinitialized like done
> > > > > at initial allocation thanks to kzalloc() and kobject is considered as
> > > > > already initialized by kernel. That is due to the fact struct virtio_dev
> > > > > is allocated and released at vdev resource handling level managed
> and
> > > > > virtio device is registered and unregistered at rproc subdevices level.
> > > > >
> > > > > This patch initializes struct virtio_dev to 0 before using it and
> > > > > registering it.
> > > > >
> > > > > Fixes: 7e83cab824a8 ("remoteproc: Modify recovery path to use
> > > > rproc_{start,stop}()")
> > > > >
> > > > > Reported-by: Xiang Xiao <xiaoxiang781216@xxxxxxxxx>
> > > > > Signed-off-by: Loic Pallardy <loic.pallardy@xxxxxx>
> > > > > ---
> > > > > drivers/remoteproc/remoteproc_virtio.c | 2 ++
> > > > > 1 file changed, 2 insertions(+)
> > > > >
> > > > > diff --git a/drivers/remoteproc/remoteproc_virtio.c
> > > > b/drivers/remoteproc/remoteproc_virtio.c
> > > > > index 183fc42a510a..88eade99395c 100644
> > > > > --- a/drivers/remoteproc/remoteproc_virtio.c
> > > > > +++ b/drivers/remoteproc/remoteproc_virtio.c
> > > > > @@ -332,6 +332,8 @@ int rproc_add_virtio_dev(struct rproc_vdev
> > *rvdev,
> > > > int id)
> > > > > struct virtio_device *vdev = &rvdev->vdev;
> > > > > int ret;
> > > > >
> > > > > + /* Reset vdev struct as you don't know how it has been
> previously
> > > > used */
> > > > > + memset(vdev, 0, sizeof(struct virtio_device));
> > > > > vdev->id.device = id,
> > > > > vdev->config = &rproc_virtio_config_ops,
> > > > > vdev->dev.parent = dev;
> > > > > --
> > > > > 2.7.4
> > > > >