Re: [PATCH] ARM: do not assemble iwmmxt.S with LLVM toolchain

From: Russell King - ARM Linux admin
Date: Mon Apr 13 2020 - 17:54:19 EST


On Mon, Apr 13, 2020 at 12:26:16PM -0700, Nick Desaulniers wrote:
> On Fri, Apr 10, 2020 at 11:34 AM Russell King - ARM Linux admin
> <linux@xxxxxxxxxxxxxxx> wrote:
> >
> > On Fri, Apr 10, 2020 at 06:59:48PM +0200, Andrew Lunn wrote:
> > > On Thu, Apr 09, 2020 at 04:27:26PM -0700, Jian Cai wrote:
> > > > iwmmxt.S contains XScale instructions
> > >
> > > Dumb question....
> > >
> > > Are these Xscale instructions? My understanding is that they are an
> > > instruction set of their own, implementing something similar to IA-32
> > > MMX.
> > >
> > > Would it be more accurate to say CLANG does not support the iwmmxt
> > > instruction set?
> >
> > Yes, because the XScale core on its own (otherwise known as 80200)
> > doesn't support iWMMXT.
> >
> > It's worth pointing out that the iWMMXT instruction set uses the
> > co-processor #1 instruction space as defined by the ARMv5 ARM ARM,
> > which is also the FPA (floating point accelerator) instruction
> > space - which is the FP instruction set prior to VFP.
>
> Reusing instruction encoding space? :-X I'll have to look and see how
> we define cases like this in LLVM.

Unfortunately yes.

The ARM CPU was originally an integer-only CPU, and the design was
to allow up to 16 "co-processors" to be added for bolt-on facilities.
The ARM architecture reserves various instructions in its instruction
space for the co-processors, with instructions to load/store from the
co-processor to memory (the ARM works in unison with the co-processor
to provide the address for the memory access), transfer data between
the ARM integer register set and co-processor, and instruct the co-
processor to perform various data operations.

Back in the days of the ARM2 CPU, the ARM2 on its own had no co-
processors, but had external pins to external co-processors to be added
to the system, such as the FPA (floating point accelerator) hardware.
Early Acorn ARM-based computers had a separate socket to allow the FPA
chip to be plugged in. The FPA used the co-processor 0/1 instruction
space.

The ARM3 CPU added a cache, and with it the need to control that
cache, which is where the origins of the CP15 control co-processor
comes from (although ARM3 has a totally different CP15 layout.)

When a co-processor is addressed by an instruction, if it doesn't
respond, the ARM takes an undefined instruction exception, which allows
software to emulate the co-processor - and this is how software FP
emulation had been done - userspace continues to use the FPA ISA,
with the instructions trapping to an emulator.

Eventually, VFP came along, which uses the co-processor 10/11 space,
superseding FPA. However, the principle is still there - it is
entirely _possible_ if one had enough motivation to implement a VFP
software emulator on ARM2 and "execute" VFP instructions.

By the time that Intel decided to add iWMMXT, the ARM CPUs no longer
had support for external co-processors, so FPA had likely been long
forgotten (despite lots of Linux systems using it for their FP), and
they picked the same co-processor instruction space, which means FPA
and iWMMXT are mutually exclusive: you can't have a kernel supporting
both.

In all cases, the co-processor instructions have always had two
definitions: there's the ARM integer CPU naming of the instructions
which uses things like "CDP", "LDC", "STC", "MRC", "MCR" and so forth,
and the co-processor specific naming. In fact, the early VFP
documentation referred to the ARM integer CPU naming of the
instructions, talking about the FMRX being a MRC instruction etc.

So, for example, with VFP:

fmrx r0, fpsid

and:

mrc p10, 7, r0, c0, c0, 0

are exactly the same opcode. The former is defined in what used to be
the separate VFP architecture document (such as DDI0238A), the latter
in the ARM reference manual.

It's just the same with stuff like CP15 - ARM tried in ARMv7 to rename
various CP15 instructions such as "MCR p15, 0, c7, c5, 0" (which I can
tell you off the top of my head invalidates the instruction cache to
the point of unification) to be "ICIALLU" - a neumonic I still have to
look up. These are just different names for the same opcode.

I'm sure there's a document somewhere that defines the iWMMXT
instruction set (I don't seem to have it), but ultimately it can be
described in terms of the ARM co-processor instruction set, just like
any of the other ARM co-processors.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up