Re: NOP instruction

Gabriel Paubert (paubert@iram.es)
Tue, 15 Dec 1998 11:50:19 +0100 (MET)


On Tue, 15 Dec 1998, Helge Hafting wrote:

>
> [...]
> > > LOCK
> > > SEG ES
> > > MOV DI,DI
> > >
> > > The first time I read through that, I wondered what on earth it was
> > > supposed to do, and only later did I realise it was effectively a
> > > rather verbose NOP instruction...
> Odd choice for a NOP - the LOCK can be slow on
> SMP systems, or maybe not if we aren't actually accessing memory?

Very odd choice indeed. A lock prefix can only be used with some
instructions (see the list in any good Inetl manual) which performs
a read-modify-write on an operand n memory. Otherwise you get an exception
(number 6= undefined opcode AFAIR). So the preceding code is *guaranteed*
to generate an exception. MOV DI,DI with eventually a segment override,
address size, data size, and/or repeat prefix is ok, since these prefixes
won't affect the instruction. But prefix decoding is extremely slow on
recent intel processors, and gratuitous prefixes should never be used.

> >
> > I've always wondered... A much used NOP "instruction" is
> > OR AX,AX (or similar);
> > But doesn't this, as well as MOV DI,DI, set the zero flag???
> MOV doesn't set any flags, if I remember correctly. The compiler
> may feel free to use "OR AX,AX" and similiar as a NOP if it knows
> that the zero flag won't be needed or that it will be
> set to the correct value anyway (because the instruction
> before the OR AX,AX calculated a result in AX)

OR actually affects all flags (including the carry which may be important
in some cases).

>
> > As a NOP instruction is supposed to do nothing, would this then be wrong?
> > (Nothing meaning not setting any flags?)
> >
> > Hmmm... Where's the time from the good ol' Z80 when you HAD a real NOP
> > instruction? *Sigh* :-)
> The x86 has a real nop instruction. The point of using other
> NOP forms is that the x86 NOP is one byte, and takes a fixed
> time to execute. Now, if we happen to need 6 byte of filling
> then 6 NOP's take 6 times the time to execute. Putting a
> single 6-byte do-nothing instruction there instead will be faster.
> A good compiler will use the fastest NOP form available
> for the gap to be filled. x86 instructions have sizes
> varying from 1 to 15 bytes.

Actually the Intel nop is coded as xchg [e]ax,[e]ax since the 8088. On the
386, it took exactly the same time as xchg [e]ax with any other register
and was slow. From the 486 or Pentium, I can't remember, the NOP is
optimized and much faster than the actual xchg instructions.

And it the case of GCC at least, the compiler generates assembly source
and it is the assembler that generates the optimized nops whnever the
compiler inserts an align directive. Basically, one byte nop is the
standard 0x90 Intel opcode. Larger ones are often a variation on the lea
instructions as you can see from the following excerpt from binutils
(2.9.1.0.13 but it shoud not matter infile gas/config/tc-i386.c), which
shows how the code actually plays with different possible encodings of the
same instruction:

/* Various efficient no-op patterns for aligning code labels. */
/* Note: Don't try to assemble the instructions in the comments. */
/* 0L and 0w are not legal */
static const char f32_1[] =
{0x90}; /* nop */
static const char f32_2[] =
{0x89,0xf6}; /* movl %esi,%esi */
static const char f32_3[] =
{0x8d,0x76,0x00}; /* leal 0(%esi),%esi */
static const char f32_4[] =
{0x8d,0x74,0x26,0x00}; /* leal 0(%esi,1),%esi */
static const char f32_5[] =
{0x90, /* nop */
0x8d,0x74,0x26,0x00}; /* leal 0(%esi,1),%esi */

[snipped]

static const char f32_14[] =
{0x8d,0xb4,0x26,0x00,0x00,0x00,0x00, /* leal 0L(%esi,1),%esi */
0x8d,0xbc,0x27,0x00,0x00,0x00,0x00}; /* leal 0L(%edi,1),%edi */
static const char f32_15[] =
{0xeb,0x0d,0x90,0x90,0x90,0x90,0x90, /* jmp .+15; lotsa nops */
0x90,0x90,0x90,0x90,0x90,0x90,0x90,0x90};


Up to 14 bytes, at most 2 instructions are needed. 15 bytes and more
are better handled by a jump.

Gabriel.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/