RE: [PATCH 1/2] remoteproc: drop memset when loading elf segments
From: Peng Fan
Date: Tue Apr 21 2020 - 03:42:48 EST
> Subject: Re: [PATCH 1/2] remoteproc: drop memset when loading elf
> segments
>
> On 4/13/20 4:05 AM, Peng Fan wrote:
> >
> >> Subject: Re: [PATCH 1/2] remoteproc: drop memset when loading elf
> >> segments
> >>
> >> On Thu 09 Apr 18:29 PDT 2020, Peng Fan wrote:
> >>
> >>> Hi Bjorn,
> >>>
> >>>> Subject: Re: [PATCH 1/2] remoteproc: drop memset when loading elf
> >>>> segments
> >>>>
> >>>> On Thu 09 Apr 01:22 PDT 2020, Peng Fan wrote:
> >>>>
> >>>>> To arm64, "dc zva, dst" is used in memset.
> >>>>> Per ARM DDI 0487A.j, chapter C5.3.8 DC ZVA, Data Cache Zero by VA,
> >>>>>
> >>>>> "If the memory region being zeroed is any type of Device memory,
> >>>>> this instruction can give an alignment fault which is prioritized
> >>>>> in the same way as other alignment faults that are determined by
> >>>>> the memory type."
> >>>>>
> >>>>> On i.MX platforms, when elf is loaded to onchip TCM area, the
> >>>>> region is ioremapped, so "dc zva, dst" will trigger abort.
> >>>>>
> >>>>> Since memset is not strictly required, let's drop it.
> >>>>>
> >>>>
> >>>> This would imply that we trust that the firmware doesn't expect
> >>>> remoteproc to zero out the memory, which we've always done. So I
> >>>> don't think we can say that it's not required.
> >>>
> >>> Saying an image runs on a remote core needs Linux to help zero out
> >>> BSS section, this not make sense to me.
> >>
> >> Maybe not, but it has always done it, so if there's firmware out
> >> there that depends on it such change would break them..
> >
> > Got it.
> >
> >>
> >>> My case is as following, I need to load section 7 data.
> >>> I no need to let remoteproc to memset section 8/9/10/11/12, the
> >>> firmware itself could handle that. Just because the memsz is larger
> >>> than filesz, remoreproc must memset?
> >>
> >> By having a PT_LOAD segment covering these I think it's reasonable to
> >> assume that the ELF loader should be able to touch the associated
> memory.
> >>
> >>> Section Headers:
> >>> [Nr] Name Type Addr Off
> Size
> >> ES Flg Lk Inf Al
> >>> [ 0] NULL 00000000 000000
> >> 000000 00 0 0 0
> >>> [ 1] .interrupts PROGBITS 1ffe0000 010000 000240
> 00
> >> A 0 0 4
> >>> [ 2] .resource_table PROGBITS 1ffe0240 010240 000058
> 00
> >> A 0 0 1
> >>> [ 3] .text PROGBITS 1ffe02a0 0102a0
> 009ccc 00
> >> AX 0 0 16
> >>> [ 4] .ARM ARM_EXIDX 1ffe9f6c 019f6c
> 000008
> >> 00 AL 3 0 4
> >>> [ 5] .init_array INIT_ARRAY 1ffe9f74 019f74 000004
> 04
> >> WA 0 0 4
> >>> [ 6] .fini_array FINI_ARRAY 1ffe9f78 019f78 000004
> 04
> >> WA 0 0 4
> >>> [ 7] .data PROGBITS 1fff9240 029240
> 000084
> >> 00 WA 0 0 4
> >>> [ 8] .ncache.init PROGBITS 1fff92c4 0292c4 000000
> 00
> >> W 0 0 1
> >>> [ 9] .ncache NOBITS 1fff92c4 0292c4
> 000a80
> >> 00 WA 0 0 4
> >>> [10] .bss NOBITS 1fff9d44 0292c4
> 01f5c0
> >> 00 WA 0 0 4
> >>> [11] .heap NOBITS 20019304 0292c4
> 000404
> >> 00 WA 0 0 1
> >>> [12] .stack NOBITS 20019708 0292c4
> 000400
> >> 00 WA 0 0 1
> >>>
> >>>>
> >>>>> Signed-off-by: Peng Fan <peng.fan@xxxxxxx>
> >>>>> ---
> >>>>> drivers/remoteproc/remoteproc_elf_loader.c | 7 ++-----
> >>>>> 1 file changed, 2 insertions(+), 5 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/remoteproc/remoteproc_elf_loader.c
> >>>>> b/drivers/remoteproc/remoteproc_elf_loader.c
> >>>>> index 16e2c496fd45..cc50fe70d50c 100644
> >>>>> --- a/drivers/remoteproc/remoteproc_elf_loader.c
> >>>>> +++ b/drivers/remoteproc/remoteproc_elf_loader.c
> >>>>> @@ -238,14 +238,11 @@ int rproc_elf_load_segments(struct rproc
> >>>>> *rproc,
> >>>> const struct firmware *fw)
> >>>>> memcpy(ptr, elf_data + offset, filesz);
> >>>>>
> >>>>> /*
> >>>>> - * Zero out remaining memory for this segment.
> >>>>> + * No need zero out remaining memory for this segment.
> >>>>> *
> >>>>> * This isn't strictly required since dma_alloc_coherent
> already
> >>>>> - * did this for us. albeit harmless, we may consider removing
> >>>>> - * this.
> >>>>> + * did this for us.
> >>>>
> >>>> In the case of recovery this comment is wrong, we do not
> >>>> dma_alloc_coherent() the carveout during a recovery.
> >>>
> >>> Isn't the it the firmware's job to memset the region?
> >>>
> >>
> >> I'm not aware of this being a documented requirement, we've always
> >> done it here for them, so removing this call would be a change in behavior.
> >>
> >>>>
> >>>> And in your case you ioremapped existing TCM, so it's never true.
> >>>>
> >>>>> */
> >>>>> - if (memsz > filesz)
> >>>>> - memset(ptr + filesz, 0, memsz - filesz);
> >>>>
> >>>> So I think you do want to zero out this region. Question is how we do it...
> >>>
> >>> I have contacted our M4 owners, we no need clear it from Linux side.
> >>
> >> And I think _most_ firmware out there, like yours, does clear BSS etc
> >> during initialization.
> >>
> >>> We also support booting m4 before booting Linux, at that case, Linux
> >>> has noting to do with memset. It is just I try loading m4 image with
> >>> Linux, and met the issue that memset trigger abort.
> >>>
> >>
> >> Please see the proposal for attaching to already running remoteproc's
> >> from Mathieu. I don't expect that you want to load your PROGBITS
> >> either in this case?
> >
> > No need to load for early boot case. It is just userspace load trigger
> > kernel panic, because memset arm64 could not work for ioremapped
> memory.
> >
> > How about adding ops hooks for memset and memcpy to let driver have
> > their own implementation?
>
> Hi Peng,
>
> The trick is to use the ioremap_wc() variant instead of ioremap() in your
> platform driver while mapping the TCMs. I know multiple folks have run into
> this issue. This is what most of the remoteproc drivers use, and mmio-sram
> driver also uses the same.
TCM is different from OCRAM in i.MX chips.
ioremap_wc not work for TCM memory, I just tried that.
I am thinking we change memset/memcpy to use
memset_io/memcpy_fromio/memcpy_toio for remoteproc common code,
Thanks,
Peng.
>
> regards
> Suman