Re: [PATCH 0/9] arm64: dts: rockchip: Initial Toybrick TB-RK1808M0 support
From: Marc Zyngier
Date: Mon May 17 2021 - 09:42:30 EST
Andreas,
On Mon, 17 May 2021 13:22:27 +0100,
Andreas Färber <afaerber@xxxxxxx> wrote:
>
> Hi Marc,
>
> On 17.05.21 11:02, Marc Zyngier wrote:
> > On Mon, 17 May 2021 00:05:42 +0100,
> > Andreas Färber <afaerber@xxxxxxx> wrote:
> >> Patches are based on the shipping toybrick.dtb file.
>
> >> http://t.rock-chips.com/en/wiki.php?mod=view&id=110 gives instructions for
>
> >> compiling sources, but no source download or link is actually provided.
>
> >>
>
> >> I encountered a hang: earlycon revealed it being related to KVM and
> >> vGIC. Disabling KVM in Kconfig works around it, as does removing
> >> the vGIC irq in DT. I've already tried low and high for the vGIC
> >> interrupt, so no clue what might cause it. On an mPCIe card with 1
> >> GiB of RAM I figured KVM is not going to be a major use case, so if
> >> we find no other solution, we could just delete the interrupts
> >> property in its .dts, as demonstrated here.
> >
> > I think you figured it out wrong,
>
> Did I? I identified that an issue resulting in no serial console was
> dependent on CONFIG_KVM being enabled and specifically to the vGIC
> interrupt being specified in my DT. That's all I said.
I guess we have a different way to approach these issues. Rather than
disabling a feature, I would have reached out to narrow the problem
down *before* posting a series.
> I never claimed KVM code was to blame, you should know me better by
> now!
Maybe it *is* to blame, and I'd really like to know.
> > for a number of reasons:
> >
> > - KVM hanging is usually a sign that you have described the platform
> > the wrong way. Either you are stepping over reserved memory regions,
> > or you have badly described the GIC itself.
>
> This whole series is about a new DT hardware description, so yes, that
> is the most likely source of the problem I'm observing. Without further
> hints how to verify what may cause it, you're just stating the obvious.
>
> The only /reserved-memory entries in the shipping DTB are drm-logo of
> size 0 and ramoops - the latter I could try to test, but I'd assume that
> to just be a software convention that for lack of oops should not affect
> KVM here?
>
> And why would reserved memory affect the vGIC but no other driver doing
> allocations? Any way to narrow it down, does vGIC allocate specially?
Not an existing reserved memory, but instead the lack of a reserved
memory description in the DT, on which KVM would happily step as part
of its own allocations. Having a working vGIC adds a substantial
amount of code paths and (surprise!) interrupt handling.
> Only other issue I'm seeing is Debian failing to mount partitions that I
> checked I do have drivers built in for and ends up failing to provide an
> emergency shell. In order to boot a clean openSUSE rootfs for comparison
> I'd first need to figure out adding any USB host nodes and clocks.
>
> >
> > - It could also be a bug in KVM, which will need to be fixed. If
> > that's because the HW is broken, we need to be able to detect it.
> >
> > - You cannot be prescriptive of what a user is going to run. People
> > have been running KVM on systems with less memory than that.
> >
> > So no, we don't paper over these issues.
>
> As you can see in patch 3, it does include the vGIC interrupt, so that
> anyone with access to the TB-96AIoT or any EVB can test KVM and report
> success or failure. Thus I don't see me as papering over something here.
>
> However, patch 5 is needed to test this patchset on at least M0 - to
> have serial and eMMC rootfs working - until a better fix is found.
And that's not papering over the problem? OK, nevermind. Not to
mention that the GIC node has some obvious mistakes which result from
copy-paste.
> > We work out what is going
> > wrong and we fix it.
>
> Thanks. You were specifically copied to advise on
> how to figure out what might cause it, so that we/I can fix it properly. :)
>
> As I mentioned, I already tried changing the interrupt between high and
> low (which was a likely bug source on Realtek RK1319 (where I'm still
> waiting on them to confirm a ~year later...)).
Which has no influence since the GIC-500 PPIs are not configurable in
SW, and the presence of this attribute in the DT is just for
documentation.
> I don't have a data source other than the downstream .dtb to check the
> interrupt number - mainline PX30/RK3308/RK3328/RK3368/RK3399 do all use
> 9 and high consistently though, so I figured it's likely correct.
>
> What I was wondering is whether the vGIC, similar to arch timer, might
> need some initialization in the bootloader? (Note: No U-Boot sources
> either at the link.)
As long as the PPIs are set as group-1NS, this is enough. You can find
out by dumping the redistributors' GICR_IGROUPR0 registers. Nothing
else is required for the GIC to behave.
> Unfortunately I'm seeing a recurring pattern (cf. Realtek) that vendors
> in their BSPs don't enable KVM and thus don't validate their hardware
> description against KVM; their shipping 4.4 based kernel here does not
> seem to have KVM enabled.
>
> Or is it possible for vendors to actually have a Cortex-A35 without the
> Armv8 Virtualization Extensions in silicon? If so, how could one verify?
There is no "Armv8 Virtualization Extensions". There is only EL2, and
you are already booting at that exception level, or KVM wouldn't even
try to initialise.
It would probably help if you posted a full dmesg as well as added
some basic tracing in the vgic init code so that we can figure out
*what* is going wrong, so that we can all stop making idle guesses.
M.
--
Without deviation from the norm, progress is not possible.