Re: [ABOMINATION] x86: Fast interrupt return to userspace

From: Linus Torvalds
Date: Tue May 06 2014 - 17:00:26 EST


On Tue, May 6, 2014 at 1:35 PM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> Heh. That is pretty disgusting. But I guess it could be interesting
> for timing. BRB.

Ooh. That's friggin impressive.

Guys, see if you can recreate these numbers. This is my totally
disgusting test-case, which really is just stress-testing page faults
and nothing else.

Silly C file attached, see the comment at the top of it. Then just do
"time ./a.out". It's designed to map the zero-page and access it. The
"start" thing was to make sure it's not hugepage-aligned, but that's
not actually enough with a big 1GB area, so you do need that whole
"echo never" thing since there will be tons of aligned areas that the
kernel will make noops for this case otherwise.

Anyway, on my Haswell with normal "iret", that program takes 8.4+-0.1 seconds.

With the disgusting sysret hackery, it takes 6.5+-0.1 seconds. That's
a rather impressive 23% performance improvement for page faulting.

I'll do profiles and test the kernel compile too, but the raw timings
are certainly promising. The "sysret" hack is pretty disgusting, and
it's broken too. sysret doesn't do some things iret does (like TF flag
etc), so it's not complete, but it's clearly good enough to run tests
on. It will definitely break ptrace() and friends.

Linus
//
// Make sure to do
//
// echo never >/sys/kernel/mm/transparent_hugepage/enabled
//
// to disable THP for this stupid test-case.

#include <stdio.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <unistd.h>

#define SIZE (1024*1024*1024)

int main(int argc, char **argv)
{
void *addr, *start;
int i;

start = 8192 + mmap(NULL, 4096, PROT_READ, MAP_PRIVATE | MAP_ANON, -1, 0);
start = (void *)(8192 | (unsigned long) start);

for (i = 0; i < 100; i++) {
unsigned int j;
addr = mmap(start, SIZE, PROT_READ, MAP_PRIVATE | MAP_ANON, -1, 0);
for (j = 0; j < SIZE; j += 4096) {
*(volatile int *)(j+addr);
}
munmap(addr, SIZE);
}
return 0;
}