Re: [PATCH V2] remoteproc: core: Clear table_sz when rproc_shutdown

From: Mathieu Poirier
Date: Mon Mar 31 2025 - 11:48:24 EST


On Sat, Mar 29, 2025 at 08:56:29PM +0800, Peng Fan wrote:
> On Fri, Mar 28, 2025 at 08:14:41AM -0600, Mathieu Poirier wrote:
> >On Fri, Mar 28, 2025 at 12:50:12PM +0800, Peng Fan wrote:
> >> On Thu, Mar 27, 2025 at 11:46:33AM -0600, Mathieu Poirier wrote:
> >> >Hi,
> >> >
> >> >On Wed, Mar 26, 2025 at 10:02:14AM +0800, Peng Fan (OSS) wrote:
> >> >> From: Peng Fan <peng.fan@xxxxxxx>
> >> >>
> >> >> There is case as below could trigger kernel dump:
> >> >> Use U-Boot to start remote processor(rproc) with resource table
> >> >> published to a fixed address by rproc. After Kernel boots up,
> >> >> stop the rproc, load a new firmware which doesn't have resource table
> >> >> ,and start rproc.
> >> >>
> >> >
> >> >If a firwmare image doesn't have a resouce table, rproc_elf_load_rsc_table()
> >> >will return an error [1], rproc_fw_boot() will exit prematurely [2] and the
> >> >remote processor won't be started. What am I missing?
> >>
> >> STM32 and i.MX use their own parse_fw implementation which allows no resource
> >> table:
> >> https://elixir.bootlin.com/linux/v6.13.7/source/drivers/remoteproc/stm32_rproc.c#L272
> >> https://elixir.bootlin.com/linux/v6.13.7/source/drivers/remoteproc/imx_rproc.c#L598
> >
> >Ok, that settles rproc_fw_boot() but there is also rproc_find_loaded_rsc_table()
> >that will return NULL if a resource table is not found and preventing the
> >memcpy() in rproc_start() from happening:
> >
> >https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_core.c#L1288
>
>
> Sorry, I forgot to mention below code:
> loaded_table is a valid pointer for i.MX, see
> https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/imx_rproc.c#L666,

(SIGH)

The changelong for this patch says "... load a new firmware which doesn't have a
resource table..." and now you are telling me that @load_table is valid. As
such I have to _guess_ that @priv->rsc_table is not null. So which is it -
valid or not valid?

If my assumption above is valid than fix that instead of hacking the remoteproc
core. If my assumption is not valid the changelog and your justification for
this patch are wrong. Either way I have spent way too much time on this patch
already and dropping it. The same goes for your other patch [1] - resent it
when you will have properly address the work herein.

[1]. [PATCH] remoteproc: imx_rproc: Add mutex protection for workqueue

>
> So loaded_table is valid, it is memcpy trigger kernel panic because table_sz is
> not zero while cached_table is NULL.
> loaded_table = rproc_find_loaded_rsc_table(rproc, fw);
> if (loaded_table) {
> memcpy(loaded_table, rproc->cached_table, rproc->table_sz);
> rproc->table_ptr = loaded_table;
> }
>
> Thanks,
> Peng
>
> >
> >>
> >> Thanks,
> >> Peng
> >>
> >> >
> >> >[1]. https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_elf_loader.c#L338
> >> >[2]. https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_core.c#L1411
> >> >
> >> >> When starting rproc with a firmware not have resource table,
> >> >> `memcpy(loaded_table, rproc->cached_table, rproc->table_sz)` will
> >> >> trigger dump, because rproc->cache_table is set to NULL during the last
> >> >> stop operation, but rproc->table_sz is still valid.
> >> >>
> >> >> This issue is found on i.MX8MP and i.MX9.
> >> >>
> >> >> Dump as below:
> >> >> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> >> >> Mem abort info:
> >> >> ESR = 0x0000000096000004
> >> >> EC = 0x25: DABT (current EL), IL = 32 bits
> >> >> SET = 0, FnV = 0
> >> >> EA = 0, S1PTW = 0
> >> >> FSC = 0x04: level 0 translation fault
> >> >> Data abort info:
> >> >> ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> >> >> CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> >> >> GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> >> >> user pgtable: 4k pages, 48-bit VAs, pgdp=000000010af63000
> >> >> [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
> >> >> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> >> >> Modules linked in:
> >> >> CPU: 2 UID: 0 PID: 1060 Comm: sh Not tainted 6.14.0-rc7-next-20250317-dirty #38
> >> >> Hardware name: NXP i.MX8MPlus EVK board (DT)
> >> >> pstate: a0000005 (NzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> >> >> pc : __pi_memcpy_generic+0x110/0x22c
> >> >> lr : rproc_start+0x88/0x1e0
> >> >> Call trace:
> >> >> __pi_memcpy_generic+0x110/0x22c (P)
> >> >> rproc_boot+0x198/0x57c
> >> >> state_store+0x40/0x104
> >> >> dev_attr_store+0x18/0x2c
> >> >> sysfs_kf_write+0x7c/0x94
> >> >> kernfs_fop_write_iter+0x120/0x1cc
> >> >> vfs_write+0x240/0x378
> >> >> ksys_write+0x70/0x108
> >> >> __arm64_sys_write+0x1c/0x28
> >> >> invoke_syscall+0x48/0x10c
> >> >> el0_svc_common.constprop.0+0xc0/0xe0
> >> >> do_el0_svc+0x1c/0x28
> >> >> el0_svc+0x30/0xcc
> >> >> el0t_64_sync_handler+0x10c/0x138
> >> >> el0t_64_sync+0x198/0x19c
> >> >>
> >> >> Clear rproc->table_sz to address the issue.
> >> >>
> >> >> While at here, also clear rproc->table_sz when rproc_fw_boot and
> >> >> rproc_detach.
> >> >>
> >> >> Fixes: 9dc9507f1880 ("remoteproc: Properly deal with the resource table when detaching")
> >> >> Signed-off-by: Peng Fan <peng.fan@xxxxxxx>
> >> >> ---
> >> >>
> >> >> V2:
> >> >> Clear table_sz when rproc_fw_boot and rproc_detach per Arnaud
> >> >>
> >> >> drivers/remoteproc/remoteproc_core.c | 3 +++
> >> >> 1 file changed, 3 insertions(+)
> >> >>
> >> >> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> >> >> index c2cf0d277729..1efa53d4e0c3 100644
> >> >> --- a/drivers/remoteproc/remoteproc_core.c
> >> >> +++ b/drivers/remoteproc/remoteproc_core.c
> >> >> @@ -1442,6 +1442,7 @@ static int rproc_fw_boot(struct rproc *rproc, const struct firmware *fw)
> >> >> kfree(rproc->cached_table);
> >> >> rproc->cached_table = NULL;
> >> >> rproc->table_ptr = NULL;
> >> >> + rproc->table_sz = 0;
> >> >> unprepare_rproc:
> >> >> /* release HW resources if needed */
> >> >> rproc_unprepare_device(rproc);
> >> >> @@ -2025,6 +2026,7 @@ int rproc_shutdown(struct rproc *rproc)
> >> >> kfree(rproc->cached_table);
> >> >> rproc->cached_table = NULL;
> >> >> rproc->table_ptr = NULL;
> >> >> + rproc->table_sz = 0;
> >> >> out:
> >> >> mutex_unlock(&rproc->lock);
> >> >> return ret;
> >> >> @@ -2091,6 +2093,7 @@ int rproc_detach(struct rproc *rproc)
> >> >> kfree(rproc->cached_table);
> >> >> rproc->cached_table = NULL;
> >> >> rproc->table_ptr = NULL;
> >> >> + rproc->table_sz = 0;
> >> >> out:
> >> >> mutex_unlock(&rproc->lock);
> >> >> return ret;
> >> >> --
> >> >> 2.37.1
> >> >>
> >> >
> >