Possible Pentium MMX CPU bug?

Ingo Molnar (mingo@pc7537.hil.siemens.at)
Fri, 28 Nov 1997 15:01:41 +0100 (MET)


I think i have found the reason why certain applications (Quake, GCC)
crash on certain Pentium MMX processors. I suspect it is a CPU bug. I have
a testprogram that detects this bug/condition, and have made a Linux
kernel patch that is a first attempt to work the bug around.

Many thanks to Krisztian Mizser, who has been a big help in testing out
experimental patches, bugfixes and testcode!

NOTE: this (suspected!) CPU bug is not security-relevant, my current
understanding is that it cannot be used to endanger multiuser-systems.
It's effect is that certain self-modifying code will run 'not as
specified', but i can see no danger to the system whatsoever, other
than a crashing application.

Temporary workaround (until a faster workaround is found):

apply this patch to 2.0.32 (originally Thomas Sailer has found this
solution by experimentation):
----------------------------------------------->
--- linux/mm/.memory.c Fri Nov 28 12:52:48 1997
+++ linux/mm/memory.c Fri Nov 28 14:06:58 1997
@@ -638,7 +638,7 @@
flush_cache_page(vma, address);
set_pte(page_table, pte_mkwrite(pte_mkdirty(mk_pte(new_page, vma->vm_page_prot))));
free_page(old_page);
- flush_tlb_page(vma, address);
+ flush_tlb_mm(tsk->mm);
return;
}
flush_cache_page(vma, address);
<----------------------------------------------

Here is the testprogram i wrote to detect the (suspected!) CPU bug. It
outputs this on a Stepping 6 'Classic' Pentium 100 MHz:

$ ./mw_bug
testing bug ... no Pentium MMX 'missed write' bug in this CPU.

and this is displayed on a Stepping 3 Pentium/MMX 200 MHz processor:

$ ./mw_bug
testing bug ... possible Pentium MMX 'missed write' CPU bug detected!

Wether this really is a bug or not, is not yet known.

My current theory is that possibly the Pentium MMX's write queue is not
flushed properly when an existing TLB changes it's write permission bit,
possibly the BTB-drive code prefetch cycle does not properly notice the
pending write to the (meanwhile changed) TLB, and prefetches the old, not
yet modified version of the cacheline ... resulting in nonspecified
behavior in the application.

Affected applications so far: GCC trampoline users (including GCC itself),
and Quake.

Affected CPUs: only two CPUs were tested so far, a Stepping 7 P54C, and a
Stepping 3 MMX Pentium.

-- mingo

----------- gcc -o mw_bug mw_bug.c ----------------->
/*
* This code demonstrates a _suspected_ bug in certain MMX Pentium
* processors, called the 'Missed Write' bug.
*
* Compile it under Linux with: 'gcc -o mw_bug mw_bug.c'
*
* Written by Ingo Molnar 1997.11.28, this code is under the GPL
*/

#include <stdio.h>
#include <string.h>
#include <sys/time.h>
#include <time.h>
#include <unistd.h>
#include <math.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>

int readable_word;

__asm__ ("
.globl start_bug ;
.align 4096 ;
start_bug: ;
movl $readable_word, 1f+2 ;
jmp modified_code ;

.align 32 ;
modified_code: ;
xorl %eax, %eax ;
1: ;
movl 0xdeadbeef(%eax), %eax ;
ret ;

.align 4096 ;
");

extern char start_bug;

void main (void)
{
int child_status;
void (*testfunc)(void) = NULL;

printf("testing bug ... "); fflush(stdout);

if (!fork()) {
mprotect(&start_bug, 4096,PROT_READ|PROT_WRITE|PROT_EXEC);
testfunc = (void (*)())&start_bug;
testfunc();
exit(0);
}

wait(&child_status);

if (child_status)
printf(" possible Pentium MMX 'missed write' CPU bug detected!\n");
else
printf(" no Pentium MMX 'missed write' bug in this CPU.\n");
}