Re: New Solaris 7 features

David S. Miller (davem@dm.cobaltmicro.com)
Tue, 10 Nov 1998 22:47:30 -0800


From: Nick.Holloway@alfie.demon.co.uk (Nick Holloway)
Date: 10 Nov 1998 18:39:05 -0000

alan@lxorguk.ukuu.org.uk (Alan Cox) writes:
> Except on the half of the ultrasparcs, where 64bit stuff is
> mostly disabled because of CPU bugs ;) (See Bugtraq)

What does Linux/sparc64 do with these chips?

It only matters when a 64-bit userland is available, the bugs are
triggered when the cpu is in the full 64-bit mode in userspace, when
it is in the 32-bit app compatibility mode for userland the bugs are
not possible.

They all have to do with funny shit like:

call . - some_huge_constant
ldx [some_address], %reg

Where the call causes the program counter to wrap as a 64-bit value
(only possible if the call instruction lies in the lower or upper 30
bits of the 64-bit address space) and the memory access in the delay
slot touches the memory space VA hole the MMU disallows mappings at.
It's something like this, I don't have exact details, see below.

This bug is present on all Ultra's in 64-bit mode. If you notice,
Solaris7 will not allow you to map and executable instructions into
the lower or upper 30 bits of the address space just so you have no
chance to trigger this bug.

The other bugs I know less about, and these are the one's which make
it impossible (nearly) to allow 64-bit userland programs to run safely
at all. But I do know that these bugs also involve accessing the VA
space hole in 64-bit mode in strange ways. They really fucked up the
exception checks for VA space hole accesses in the UltraSparc, many of
them if hit correctly will deadlock the instruction prefetcher.

All of these bugs were found using crashme, so we can do the same.

Currently with UltraLinux:

1) We have no 64-bit userland eventhough the kernel is 64bit

2) For the real security conscious you can disable CONFIG_BINFMT_ELF
and just enable CONFIG_BINFMT_ELF32 so only 32-bit apps can be
executed, ensuring the cpu will never go into userland in 64-bit
mode.

So once we have the 64-bit userland, and it is ready, we can run
crashme too and find out what the exact instruction sequences are, and
code workarounds which Sun was too lazy to do. Essentially (and this
is similar to what the MIPS people did in IRIX for the original MIPS
R4000 chips which had the "branch at end of page" hwbugs) you:

1) scan a page of code which will execute the first time it is brought
into the page cache, checking for the instruction sequences which
would trip the bug, you set a flag on a page so you need not check
again after the first verification. If the sequence is found, you
zap the process on the spot.

2) Any page which can be executed and is written too is only allowed
to do one or the other at one time (because the Ultra has a data
tlb and a seperate instruction tlb this is feasible), so when the
user writes we remove it from the I-TLB, when the user next tries
to execute the code the I-TLB handler verifies the contents of the
page and also removes it from the D-TLB so it can catch the next
write to the page.

#1 is not so bad for performance because frequently executed pages
will tend to stay in memory and only need to be checked the first
time, but #2 can suck raw eggs for dynamic linking. I have a fix for
this, to make a special system call for writing PLT entries which
ld.so can use. Essentially the kernel is told exactly what
instructions the user wishes to write and where, the kernel need only
verify that these changes would not trigger the hardware bug and it
performs the writes itself.

Another thing which makes this less expensive to workaround than it
could is that the first hwbug only happens for instruction present in
certain address ranges, so if the page is being mapped somewhere else
you can skip those checks. The second hwbug is only present in
earlier Ultra-I cpus, so this one will only need to be checked on
those processors.

Sun has not been very forthcoming with information about these
hardware bugs. They describe workarounds (don't map executable code
in 64-bit mode to these addresses, don't enable 64-bit mode for these
cpu models, etc.) but nothing more. I thought these days were gone,
but aparently there is some "value" to this information. They've just
made it a little annoyingly difficult for anyone to implement a real
fix for the bug.

I was offered to sign some NDA's such that I could be told the details
of the bug, but I said no. I said no because if they told me, I could
not implement the workaround described above without revealing the
details of the instruction sequences which cause the problem (and thus
breaking the NDA). So my plan is to run a logging crashme in 64-bit
userland to figure out what they are, and post the precise results
here and to bugtraq because there is no reason this information should
not be available to anyone. Sun is only pissing people off by
witholding the details, so I'll fix that up neatly some time soon.

Later,
David S. Miller
davem@dm.cobaltmicro.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/