that's right. The optimal solution doesn't require the the NOP5 at all,
and I've been playing around with an implementation that doesn't have
it. The problem I've been running into is that sometimes the compiler
will put in a short jump - '0xeb', with a 1 byte offset, but the jump
target is further away. Thus, I need to either ensure the target is
close, or somehow force a longer jump '0xe9' into the code so I always
have the space. The other advantage of not including the nop is easier
support for all x86 implementations, since I'm not sure a 5 byte atomic
nop is always available, whereas a jump is always atomic. I'm pretty
sure we can come up with a patch that avoids the nop...I'll keep working
on it.