Re: [PATCH v2 00/12] PolarFire SoC reset controller & clock cleanups

From: Conor.Dooley
Date: Thu Aug 11 2022 - 09:13:37 EST


Hey Nathan,

On 10/08/2022 20:43, Conor Dooley - M52691 wrote:
> On 10/08/2022 20:32, Nathan Chancellor wrote:
>> On Wed, Aug 10, 2022 at 07:20:24PM +0000, Conor.Dooley@xxxxxxxxxxxxx wrote:
>>> On 10/08/2022 19:56, Nathan Chancellor wrote:
>>>> Hi Conor,
>>>>
>>>> On Tue, Aug 09, 2022 at 11:05:32PM +0000, Conor.Dooley@xxxxxxxxxxxxx wrote:
>>>>> +CC clang people :)
>>>>>
>>>>> Got an odd one here and would appreciate some pointers for where to
>>>>> look. This code when built with gcc boots fine, for example with:
>>>>> riscv64-unknown-linux-gnu-gcc (g5964b5cd727) 11.1.0
>>>>> The same code but build with clang build it fails to boot but prior to
>>>>> that applying this patchset it boots fine. Specifically it is the patch
>>>>> "clk: microchip: mpfs: move id & offset out of clock structs"
>>>>>
>>>>> I applied this patchset on top of tonight's master (15205c2829ca) but
>>>>> I've been seeing the same problem for a few weeks on -next too. I tried
>>>>> the following 2 versions of clang/llvm:
>>>>> ClangBuiltLinux clang version 15.0.0 (5b0788fef86ed7008a11f6ee19b9d86d42b6fcfa), LLD 15.0.0
>>>>> ClangBuiltLinux clang version 15.0.0 (bab8af8ea062f6332b5c5d13ae688bb8900f244a), LLD 15.0.0
>>>>
>>>> Good to know that it reproduces with fairly recent versions of LLVM :)
>>>>
>>>>> It's probably something silly that I've overlooked but I am not au
>>>>> fait with these sort of things unfortunately, but hey - at least I'll
>>>>> learn something then.
>>>>
>>>> I took a quick glance at the patch you mentioned above and I don't
>>>> immediately see anything as problematic...
>>>
>>> Yeah, I couldn't see any low hanging fruit either.
>>>
>>>> I was going to see if I could
>>>> reproduce this locally in QEMU since I do see there is a machine
>>>> 'microchip-icicle-kit' but I am not having much success getting the
>>>> machine past SBI. Does this reproduce in QEMU or are you working with
>>>> the real hardware? If QEMU, do you happen to have a working invocation
>>>> handy?
>>>
>>> Yeah... So there was a QEMU incantation that worked at some point in
>>> the past (ie when someone wrote the QEMU port) but most peripherals
>>> are not implemented and current versions of our openSBI implementation
>>> requires more than one of the unimplemented peripherals. I was trying to
>>> get it working lately in the evenings based on some patches that were a
>>> year old but no joy :/
>>
>> Heh, I guess that would explain why it wasn't working for me :)
>>
>>> I'm running on the real hardware, I'll give the older combo of qemu
>>> "bios" etc a go again over the weekend & try to get it working. In the
>>> meantime, any suggestions?
>>
>> Are you building with 'LLVM=1' or just 'CC=clang'? If 'LLVM=1', I would
>> try breaking it apart into the individual options (LD=ld.lld,
>> OBJCOPY=llvm-objcopy) and see if dropping one of those makes a
>> difference. We have had subtle differences between the GNU and LLVM
>> tools before and it is much easier to look into that difference if we
>> know it happens in only one tool.
>
> LLVM=1.
>
>>
>> Otherwise, I am not sure I have any immediate ideas other than looking
>> at the disassembly and trying to see if something is going wrong. Is
>> the object file being modified in any other way (I don't think there is
>> something like objtool for RISC-V but I could be wrong)?
>
> I'll give the options a go so, I'll LYK how I get on.

So I managed to wrangle QEMU into repro-ing. booting with bootloaders
etc isn't going to work (nor will the config with gcc actually boot
properly) but it gets far enough to reproduce the problem.
You've got to jump right to the kernel for which the magic incantation
is:

$(QEMU)/qemu-system-riscv64 -M microchip-icicle-kit \
-m 2G -smp 5 \
-kernel $(wrkdir)/vmlinux.bin \
-dtb $(wrkdir)/riscvpc.dtb \
-display none -serial null \
-serial stdio

(serial0 is disabled in the dt)

With gcc there'll be a bunch of warnings like:
clk_ahb: Zero divisor and CLK_DIVIDER_ALLOW_ZERO not set
That's "fine", not sure if it's the lack of bootloaders or the
emulation but 0 isn't a value the hardware will see. With the defconfig
I provided it'll fail to boot fairly late on because of missing musb
emulation.

Doesn't really matter since thats long enough to get past the switch
out of earlycon which is where the clang built kernel dies.

Didn't get a chance to look at disassembly etc today, but as I said
last night it reproduces with GNU binutils.

Thanks,
Conor.

On another note, brought up our QEMU port's state today so fixing
it is now on the good ole, ever expanding todo list :)