Re: Fixing MIPS delay slot emulation weakness?
From: Rich Felker
Date: Sun Dec 16 2018 - 19:59:36 EST
On Sun, Dec 16, 2018 at 10:59:19AM -0800, Andy Lutomirski wrote:
> On Sun, Dec 16, 2018 at 10:13 AM Rich Felker <dalias@xxxxxxxx> wrote:
> >
> > On Sun, Dec 16, 2018 at 01:50:13PM +0000, Maciej W. Rozycki wrote:
> > > On Sat, 15 Dec 2018, Rich Felker wrote:
> > >
> > >
> > > It doesn't help that information about that is scattered across many
> > > documents. You can check for the NODS flag in the opcodes library from
> > > binutils though, which is almost 100% accurate, except for the SYNC
> > > instructions, for semantic reasons (i.e. they are allowed, but we don't
> > > want GAS to reorder them). Most of the disallowed stuff is in the
> > > microMIPS instruction set, due to encodings that execute as hardware
> > > macros.
> >
> > I think it suffices to emulate what compilers generate in delay slots,
> > which should be fairly minimal and stable. At the very least we could
> > enumerate everything GCC and LLVM already emit there, and get them to
> > upstream a policy of not adding new insns as fpu-delay-slot-allowed.
> > If someone is writing asm by hand to do ridiculous things in the delay
> > slot with random ISA extensions, they shouldn't expect it to work.
>
> I feel like I have to ask: the real thing preventing emulation is that
> new nonstandard instructions might get used in FPU delay slots on
> non-FPU-supporting hardware? This seems utterly nuts. If you're
> using custom ISA extensions, why on Earth are you also using emulated
> floating point instructions? You're targetting a specific known CPU
> if you do this, so you should use only instructions that actually work
> on that CPU.
Floating point is in the standard ABI, despite some models not having
fpu. This is what mandates floating point emulation. The reason you
have to be able to emulate or execute-out-of-line other instructions
is that there are floating point branch instructions bc1f and bc1t
(maybe others too?) with a delay slot, and if the branch is being
taken, you need some mechanism to cause the instruction in the delay
slot to still get executed. (If the branch is not taken you can just
increment PC and let it happen as a non-delay-slot.)
So in theory it's possible that there's a cpu model with fancy new
core instructions but no fpu. In this case, you would need the
capability to emulate or execute-out-of-line these instructions. But I
have no idea if such cpu models actually exist. If not, the concern
can probably be ignored and it suffices to emulate just the parts of
the base ISA that are valid in delay slots.
Rich