Re: [RFC] de-asmify the x86-64 system call slowpath
From: Linus Torvalds
Date: Wed Feb 05 2014 - 19:33:03 EST
On Sun, Jan 26, 2014 at 2:28 PM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> Comments? This was obviously brought on by my frustration with the
> currently nasty do_notify_resume() always returning to iret for the
> task_work case, and PeterZ's patch that fixed that, but made the asm
> mess even *worse*.
Actually, I should have taken a closer look.
Yes, do_notify_resume() is a real issue, and my stupid open/close
test-case showed that part of the profile.
But the "iretq" that dominates on the kernel build is actually the
page fault one.
I noticed this when I compared "-e cycles:pp" with "-e cycles:p". The
single-p version shows largely the same profile for the kernel, except
that instead of showing "iretq" as the big cost, it shows the first
instruction in "page_fault".
In fact, even when *not* zoomed into the kernel DSO, "page_fault"
actually takes 5% of CPU time according to pref report. That's really
quite impressive.
I suspect the Haswell architecture has made everything else cheaper,
and the exception overhead hasn't kept up. I'm wondering if there is
anything we could do to speed this up - like doing gang lookup in the
page cache and pre-populating the page tables opportunistically.
We're using an interrupt gate for the page fault handling, and I don't
think we can avoid that. For all I know, a trap gate might be slightly
faster (but likely not really noticeable - the microcode is surely
expensive, but the pipeline unwinding is probably the biggest cost of
the page fault), but we have the issue of interrupts causing page
faults for vmalloc pages.. And obviously we can't avoid the iretq for
the return path.
So as far as I can see, there's no sane way to make the page fault
itself cheaper. Looking at opportunistically prepopulating page tables
when it's cheap and easy might be the best we can do..
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/