Re: [PATCH -fixes] riscv: Fix relocatable kernels with early alternatives using -fno-pie

From: Alexandre Ghiti
Date: Sun May 28 2023 - 09:02:59 EST


On Sat, May 27, 2023 at 12:02 PM Conor Dooley <conor@xxxxxxxxxx> wrote:
>
> On Sat, May 27, 2023 at 11:13:18AM +0200, Alexandre Ghiti wrote:
> >
> > On 26/05/2023 18:35, Conor Dooley wrote:
> > > On Fri, May 26, 2023 at 05:24:41PM +0100, Conor Dooley wrote:
> > > > On Fri, May 26, 2023 at 05:46:30PM +0200, Alexandre Ghiti wrote:
> > > > > Early alternatives are called with the mmu disabled, and then should not
> > > > > access any global symbols through the GOT since it requires relocations,
> > > > > relocations that we do before but *virtually*. So only use medany code
> > > > > model for this early code.
> > > > >
> > > > > Signed-off-by: Alexandre Ghiti <alexghiti@xxxxxxxxxxxx>
> > > > > ---
> > > > >
> > > > > Note that I'm not very happy with this fix, I think we need to put more
> > > > > effort into "harmonizing" this very early code (ie before the mmu is
> > > > > enabled) as it is spread between different locations and compiled
> > > > > differently.
> > > > Totally & I'll happily spend the time trying to review that work.
> > > >
> > > > > I'll work on that later, but for now, this fix does what is
> > > > > needed to work (from my testing at least). Any Tested-by on the Unmatched
> > > > > and T-head boards is welcome!
> > > > On 6.4-rc1 & v6.4-rc1 + this patch, with CONFIG_RELOCATABLE added to my
> > > > config, my Nezha fails to boot. There is no output whatsoever from the
> > > > kernel. Turning off CONFIG_RELOCATABLE boots again.
> > > I don't know if this is better or worse news, but same thing happens on
> > > an icicle kit. What systems, other than QEMU, has the relocatable
> > > eries been tested with, btw?
> >
> >
> > I tested it on the Unmatched (Andreas did too).
>
> Cool. I cracked out my unmatched and it has the same issue as the
> icicle. Ditto my Visionfive v2. Here's my config.
> https://raw.githubusercontent.com/ConchuOD/riscv-env/dev/conf/defconfig
>
> A ~default qemu virt doesn't work either. (-m 2G -smp 5)

I can boot with this config using:

$ sudo ~/qemu/build/qemu-system-riscv64 -machine virt -cpu
rv64,sv48=off -nographic -m 2G -smp 5 -kernel
build_conor/arch/riscv/boot/Image -s

I noticed when trying to add this to our internal CI that I had local
failures that did not happen in the CI because the CI was not using
the same toolchain: can you give me the full .config? So that I can
see if the compiler added stack guards or some other things I did not
think of.

Thanks!

>
> > Very weird it does not work on the icicle kit, there is no errata for this
> > soc, so what gets executed this early for this soc? Do you know where it
> > fails to boot? If you can debug, you should break on the address of the
> > entry point (usually 0x8020_0000) since this is the stvec address, so when
> > you get a trap, you will branch there, and then could you dump $sepc, $ra
> > and $stval when you get there?
>
> > Regarding the thead issue, I think the following should fix it:
>
> It did not :/
>
> Cheers,
> Conor.
>