Re: [PATCH v2 00/12] PolarFire SoC reset controller & clock cleanups

From: Conor.Dooley
Date: Sun Aug 14 2022 - 07:41:36 EST


On 11/08/2022 14:13, Conor Dooley wrote:
> Hey Nathan,
>
> On 10/08/2022 20:43, Conor Dooley - M52691 wrote:
>> On 10/08/2022 20:32, Nathan Chancellor wrote:
>>> On Wed, Aug 10, 2022 at 07:20:24PM +0000, Conor.Dooley@xxxxxxxxxxxxx wrote:
>>>> On 10/08/2022 19:56, Nathan Chancellor wrote:
>>>>> Hi Conor,
>>>>>
>>>>> On Tue, Aug 09, 2022 at 11:05:32PM +0000, Conor.Dooley@xxxxxxxxxxxxx wrote:
>>>>>> +CC clang people :)
>>>>>>
>>>>>> Got an odd one here and would appreciate some pointers for where to
>>>>>> look. This code when built with gcc boots fine, for example with:
>>>>>> riscv64-unknown-linux-gnu-gcc (g5964b5cd727) 11.1.0
>>>>>> The same code but build with clang build it fails to boot but prior to
>>>>>> that applying this patchset it boots fine. Specifically it is the patch
>>>>>> "clk: microchip: mpfs: move id & offset out of clock structs"
>>>>>>
>>>>>> I applied this patchset on top of tonight's master (15205c2829ca) but
>>>>>> I've been seeing the same problem for a few weeks on -next too. I tried
>>>>>> the following 2 versions of clang/llvm:
>>>>>> ClangBuiltLinux clang version 15.0.0 (5b0788fef86ed7008a11f6ee19b9d86d42b6fcfa), LLD 15.0.0
>>>>>> ClangBuiltLinux clang version 15.0.0 (bab8af8ea062f6332b5c5d13ae688bb8900f244a), LLD 15.0.0
>>>>>
>>>>> Good to know that it reproduces with fairly recent versions of LLVM :)
>>>>>
>>>>>> It's probably something silly that I've overlooked but I am not au
>>>>>> fait with these sort of things unfortunately, but hey - at least I'll
>>>>>> learn something then.
>>>>>
>>>>> I took a quick glance at the patch you mentioned above and I don't
>>>>> immediately see anything as problematic...
>>>>
>>>> Yeah, I couldn't see any low hanging fruit either.
>>>>
>>>>> I was going to see if I could
>>>>> reproduce this locally in QEMU since I do see there is a machine
>>>>> 'microchip-icicle-kit' but I am not having much success getting the
>>>>> machine past SBI. Does this reproduce in QEMU or are you working with
>>>>> the real hardware? If QEMU, do you happen to have a working invocation
>>>>> handy?
>>>>
>>>> Yeah... So there was a QEMU incantation that worked at some point in
>>>> the past (ie when someone wrote the QEMU port) but most peripherals
>>>> are not implemented and current versions of our openSBI implementation
>>>> requires more than one of the unimplemented peripherals. I was trying to
>>>> get it working lately in the evenings based on some patches that were a
>>>> year old but no joy :/
>>>
>>> Heh, I guess that would explain why it wasn't working for me :)
>>>
>>>> I'm running on the real hardware, I'll give the older combo of qemu
>>>> "bios" etc a go again over the weekend & try to get it working. In the
>>>> meantime, any suggestions?
>>>
>>> Are you building with 'LLVM=1' or just 'CC=clang'? If 'LLVM=1', I would
>>> try breaking it apart into the individual options (LD=ld.lld,
>>> OBJCOPY=llvm-objcopy) and see if dropping one of those makes a
>>> difference. We have had subtle differences between the GNU and LLVM
>>> tools before and it is much easier to look into that difference if we
>>> know it happens in only one tool.
>>
>> LLVM=1.
>>
>>>
>>> Otherwise, I am not sure I have any immediate ideas other than looking
>>> at the disassembly and trying to see if something is going wrong. Is
>>> the object file being modified in any other way (I don't think there is
>>> something like objtool for RISC-V but I could be wrong)?
>>
>> I'll give the options a go so, I'll LYK how I get on.
>
> So I managed to wrangle QEMU into repro-ing. booting with bootloaders
> etc isn't going to work (nor will the config with gcc actually boot
> properly) but it gets far enough to reproduce the problem.
> You've got to jump right to the kernel for which the magic incantation
> is:
>
> $(QEMU)/qemu-system-riscv64 -M microchip-icicle-kit \
>     -m 2G -smp 5 \
>     -kernel $(wrkdir)/vmlinux.bin \
>     -dtb $(wrkdir)/riscvpc.dtb \
>     -display none -serial null \
>     -serial stdio
>
> (serial0 is disabled in the dt)
>
> With gcc there'll be a bunch of warnings like:
> clk_ahb: Zero divisor and CLK_DIVIDER_ALLOW_ZERO not set
> That's "fine", not sure if it's the lack of bootloaders or the
> emulation but 0 isn't a value the hardware will see. With the defconfig
> I provided it'll fail to boot fairly late on because of missing musb
> emulation.

FWIW, I posted a QEMU patch to fix the missing peripherals, so a direct
kernel boot works now for GCC:
https://lore.kernel.org/qemu-devel/20220813135127.2971754-1-mail@xxxxxxxxxxx

(btw, I am on libera as conchuod in #riscv if you ever wanna ping me about
something, usually still about for "sane" NA working hours too)

>
> Doesn't really matter since thats long enough to get past the switch
> out of earlycon which is where the clang built kernel dies.
>
> Didn't get a chance to look at disassembly etc today, but as I said
> last night it reproduces with GNU binutils.
>
> Thanks,
> Conor.
>
> On another note, brought up our QEMU port's state today so fixing
> it is now on the good ole, ever expanding todo list :)