> Louis Mandelstam <lma@sacc.org.za> writes:
> > >
> > > I saw a message somewhere today [bugtraq perhaps?] where someone
> > > claimed to have found a workaround for the 0xf0 pentium bug.
> > >
> > > If I understood it correctly, they locked the illegal instruction trap
> > > into cpu cache....
> >
> > ..and lost the performance advantage of having an internal cache, no?
>
> Actually, Miguel Angel Rodriguez Jodar <rodriguj@DRAGO.FIE.US.ES>
> forwarded a message (originally from Jim Brooks <jim@jimbrooks.org>)
> to BUGTRAQ about this possible solution.
>
> What Jim Brooks had discovered was that, if the IDT gate descriptor
> for the invalid opcode exception had been cached (by first executing a
> "legitimate" invalid opcode), then the F0 0F C7 C8 would not cause a
> hang.
>
> Miguel pointed out that, because the internal cache could be locked,
> this cached gate descriptor could be made permanent, at a horrible
> performance penalty.
True.
> HOWEVER, if Jim Brooks is right in his explanation, then we're talking
> about a cached gate descriptor, which has nothing to do with the
> "internal cache" we normally talk about; the descriptor is cached
> elsewhere.
Maybe, the Pentium has some caches for segment descriptors which were
removed on PPro and explain in part its horrible 16-bit performance.
It might also have a cache for an IDT entry, which is basically one of the
3 descriptor tables with the GDT and the LDT.
> If, as I assume (and I am clueless), the Pentium has a dedicated cache
> for a single IDT gate descriptor, then it might be possible to work
> around the bug by executing an invalid opcode (to recache the invalid
> opcode descriptor) whenever the "last used IDT descriptor" might have
> changed.
But it is very possible that this cache is thrashed by performing segment
descriptor loads. It depends on whether there are IDT specific cache entries
or the entries are shared between GDT, LDT and IDT.
> That is, it might be possible to finish off every interrupt handler by
> executing (and handling) an illegal opcode. Presumably this would have
> to be done back out in user space, to prevent a double fault. We
> would suffer a performance penalty, but it might be worth it.
Certainly not in user mode, it would switch back to kernel mode, switch
stacks and other big delays. The only way I see to implement it and to
test this hypothesis is to do the following before any iret (which is
hidden in RESTORE_ALL macro) to user mode or VM86 mode (ret_from_intr
show how to check, but this is not necessary for a proof of concept):
- jump to a specific location where you find the UD2 opcode
(replace all RESTORE_ALL by jmp ud2)
then insert the following code somewhere in arch/i386/kernel/entry.S
ud2: .byte 0x0f,0x0b
and then the code for the interrupt 6 should start with something like:
(just after ENTRY(invalid_op))
cmpl $ud2,(esp)
jnz 1f
addl $12,esp /* pop interrupt information off stack */
RESTORE_ALL /* return with exception 6 cache descriptor loaded */
1: /* Here normal interrupt entry code */
Note that this will slow down significantly all returns to user mode. This
is designed to check for the hypothesis and it may be possible to create
a faster version, but I don't see how right now :-)
Hint: put the ud2 opcode just before the interrupt 6 entry code and
align it to a cache line boundary, the code from ud2 to RESTORE_ALL is 27
bytes long if I compute it right, slightly less than a cache line :-)
I can't test it right now since I'm 50 kilometers away from my machine in the
middle of a snowstorm in Sierra Nevada.
> Any comments on workability from folks with a clue?
>
Gabriel.