Re: [RFC] ITS fails to allocate on rk3568/rk3566

From: Marc Zyngier
Date: Wed Apr 21 2021 - 06:23:38 EST


On Wed, 21 Apr 2021 02:40:26 +0100,
Kever Yang <kever.yang@xxxxxxxxxxxxxx> wrote:
>
> Hi Marc,
>
> On 2021/4/16 下午11:23, Marc Zyngier wrote:
> > On Fri, 16 Apr 2021 02:13:38 +0100,
> > Kever Yang <kever.yang@xxxxxxxxxxxxxx> wrote:
> >> Hi Marc,
> >>
> >> On 2021/4/15 下午4:11, Marc Zyngier wrote:
> >>> Hi Kever,
> >>>
> >>> On Thu, 15 Apr 2021 08:24:33 +0100,
> >>> Kever Yang <kever.yang@xxxxxxxxxxxxxx> wrote:
> >>>> Hi Marc, Peter,
> >>>>
> >>>>     RK356x GIC has two issues:
> >>>>
> >>>> 1. GIC only support 32bit address while rk356x supports 8GB DDR SDRAM,
> >>>> so we use ZONE_DMA32 to fix this issue;
> >>> What transactions does this affect exactly?
> >> The GIC on rk356x is a 32bit master, which means all the space its
> >> logic need to access should be in the 4GB range.
> > Well, at least this is consistently broken.
>
> There are many legacy IPs which only support 32bit bus, we have to
> use them as is in the new 64bit SoCs, I think the 32bit GIC can be
> considered the same as those case, can we add CONFIG_ZONE_DMA32
> support in GIC driver?

Only as an erratum workaround. The GIC isn't some random device that
can live behind an IOMMU that will sort the mapping problem for
you. By design, it must be untranslated, so limiting it to only a
portion of the physical address space is a bug, full stop.

> eg. Other peripheral driver use dma_alloc_coherent() instead of
> alloc_pages(), so then can support ZONE_DMA32 without any code
> change.

You do realise that dma_alloc_coherent() requires the use of a struct
device, and that the GIC is required to be up and running long before
the device model is available, right?

> >>> Only some ITS tables? Or
> >>> all of them, including the command queue? What about the configuration
> >>> and pending tables associated with the redistributors?
> >>>
> >>>> 2. GIC version is r1p6-00rel0, RK356x interconnect does not support
> >>>> GIC and CPU snoop to each other, hence the GIC does not support the
> >>>> shareability feature.  The read of register value for shareability
> >>>> feature does not return as expect in GICR and GITS, so we have to
> >>>> workaround for it.
> >>> How about the cacheability attribute? Can you please provide the exact
> >>> set of attributes that this system actually supports for each of the
> >>> ITS and redistributor base registers?
> >> The shareability attributes in GICR_PENDBASEER, GICR_PROPBASER,
> >> GITS_BASERn, GITS_CBASER default value is 0b00, when we set 0b01 then
> >> read returns 0b01.
> > And I claim that this is a perfectly broken behaviour. How do you
> > expect software to find about the gory details of the integration?
> > That's the only way for SW to find out what the HW is capable of...
>
> As a software engineer, I totally agree with you on this point, but
> when we back to the truth, we should know that these registers has
> never been defined as "module capability for shareability"

Says who? If SW cannot discover the capability of the HW, then nothing
works.

> in hardware. The software use it as capability detect, and it works
> before, but not means match the original hardware design, in this
> case the new hardware may not follow the design because it's not a
> standard or a clear enough guide for it.

The spec has been clear enough for *everyone* so far, and other GIC600
implementations got it perfectly right. There has been *one*
exception, which is documented and handled as a quirk.

> The capability register should always read-only instead of
> read-write, the understanding software engineer and from hardware
> engineer always have a gap.
>
> I would proposal to add a optional dts property for driver to
> identify if the board's GIC shareability or not, which is a method
> used by many other module drivers.

Are you also going to propose and support an ACPI specification update
to cope with similarly broken devices? Either this is something that
is available across the various firmware implementations, or it
doesn't exist.

>
> >> Since there is no ACE coherency interface for this GIC controller, all
> >> the cacheability in the GIC is not support in hardware.
> >>
> >>> Also, please provide errata numbers for these two issues so that we
> >>> can properly document them and track the workarounds.
> >> What kind of errata do you need, could you please share any kind of
> >> example close to this case?
> > I would like something that says:
> >
> > "ROCKCHIP_ERRATUM_123456: The GIC600 integration in RK356x doesn't
> > support any of the shareability or cacheability attributes, and
> > requires both values to be set to 0b00 for all the ITS and
> > Redistributor tables."
> >
> > This is pretty similar to the bug affecting ThunderX with its "erratum
> > 24313" (covered by CONFIG_CAVIUM_ERRATUM_22375), where the tables have
> > to be flagged as non-cacheable. The Rockchip one is just worse.
> >
> > We need an official erratum number so that we can refer to it in the
> > source, commit log and documentation, as well as cross-reference it
> > with the TRM. This number will be part of a configuration symbol that
> > will make the compilation conditional so that people don't have to
> > carry the extra burden generated by this bug if they don't need to.
> >
> > Same thing goes for the 32bit bug.
> >
> >> We consider this as a SoC implement design instead of a bug, so we
> >> will add document in RK356X  TRM to describe the GIC design, but no
> >> idea how to provide the errata.
> >>
> >> Here is the shareabily attribute from ARM GIC architecture specification:
> >> Shareability, bits [11:10] (from GITS_CBASER)
> >> Indicates the Shareability attributes of accesses to the command
> >> queue. The possible values of this field are:
> >> 0b00 Non-shareable.
> >> 0b01 Inner Shareable.
> >> 0b10 Outer Shareable.
> >> 0b11 Reserved. Treated as 0b00.
> >> It is IMPLEMENTATION DEFINED whether this field has a fixed value or
> >> can be programmed by software. Implementing this field with a fixed
> >> value is deprecated.
> >> On a Warm reset, this field resets to an architecturally UNKNOWN value
> >>
> >> As you can see, "Implementing this field with a fixed value is
> >> deprecated", so software should program this field to '0b00
> >> Non-shareable' if the SoC design does not support the cache
> >> shareability.
> > [I really feel special when people quote the GIC spec at me]
> >
> > That isn't what it says. Hardcoding the field with a fixed value is
> > indeed deprecated, but that doesn't mean this field should accept
> > values that the HW cannot support. If anything, what this says is "try
> > and implement the options that SW is going to use".
> >
> > But you need to give SW an indication of what is usable, because there
> > is no other way to *discover* what the SoC is capable of at runtime.
> > Otherwise, we would need to carry a per-SoC list of what the HW
> > supports. I don't think that's the right thing to do (and you're about
> > 8 years too late anyway).
> >
> > Other GIC600 integrations got it perfectly right, by the way. Same for
> > other GIC implementations, with the notable exception of Cavium and
> > their first GIC in ThunderX, as described above.
>
> I'm not sure if you still work for ARM or not, but we do have double

I don't think whoever pays my salary is relevant in this context. I am
writing as the guy who maintains the GIC architecture in Linux, and
the fact that I have worked, work, or will work for ARM or not has no
bearing on what I am saying here.

> check again and again with IP vendor about this point, the GIC500
> works because it's hardware bind to a fix value inside the IP, but
> GIC600 does not do the same binding, and there is no any config
> option to hardcode the reg field to a fixed value in the IP when
> we don't need the ACE-lite.
>
> This is why I insist that this is a different SoC implementation
> instead of a SoC bug, we should add the support in the GIC driver
> for different hardware case.
>
> When you say we're about 8 years too late, I think there still not
> much SoCs using GIC600, maybe the GIC600 IP is available for user at
> mid of 2017?

I'm not sure what the year 2017 does here. SW written as early as 2013
and compliant with the architecture still works on GIC600 when it is
properly integrated. Linux also has specific requirements, and you
have decided to ignore them. That's your choice. But you also get to
keep the pieces when things break.

Now, this discussion has lasted long enough, and isn't going anywhere.
If you want upstream support for your HW, I have explained what needs
to be done: document the two problems, implement the workarounds using
the GIC quirk infrastructure just like any other silicon vendor has
done over the past 8 years.

If you don't want to do it, it suits me just as well. I won't have to
maintain yet another set of SoC-specific workarounds.

Thanks,

M.

--
Without deviation from the norm, progress is not possible.