Re: [PATCH 2/2] x86: Only do a single page fault for copy_from_user_nmi

From: Peter Zijlstra
Date: Mon Sep 29 2014 - 07:56:27 EST


On Fri, Sep 26, 2014 at 04:31:17PM -0700, Andi Kleen wrote:
> From: Andi Kleen <ak@xxxxxxxxxxxxxxx>
>
> When copy_from_user_nmi faults the copy_user_tail code ends
> up "replaying" the page faults to compute the exact tail bytes,
> (added with 112958).

That is a wrong way to quote commits in two ways;

1) Linus 'requires' you use 12 character abreviated hashes (because
we've already seen collisions with the default 8), yet you use 6.

2) the recommended quoting style is:

1129585a08ba ("x86: introduce copy_user_handle_tail() routine")

You _should_ know this.

> So we do an expensive page fault. And then we do it *again*.
>
> This ends up being very expensive in the PMI handler for any
> page fault on a stack access, and is one the more common
> causes for the NMI handler exceeding its runtime limit.
>
> 1) 0.109 us | copy_from_user_nmi();
> 1) | copy_from_user_nmi() {
> 1) | __do_page_fault() {
> 1) | bad_area_nosemaphore() {
> 1) | __bad_area_nosemaphore() {
> 1) | no_context() {
> 1) | fixup_exception() {
> 1) | search_exception_tables() {
> 1) 0.079 us | search_extable();
> 1) 0.409 us | }
> 1) 0.757 us | }
> 1) 1.106 us | }
> 1) 1.466 us | }
> 1) 1.793 us | }
> 1) 2.233 us | }
> 1) | copy_user_handle_tail() {
> 1) | __do_page_fault() {
> 1) | bad_area_nosemaphore() {
> 1) | __bad_area_nosemaphore() {
> 1) | no_context() {
> 1) | fixup_exception() {
> 1) | search_exception_tables() {
> 1) 0.060 us | search_extable();
> 1) 0.412 us | }
> 1) 0.764 us | }
> 1) 1.074 us | }
> 1) 1.389 us | }
> 1) 1.665 us | }
> 1) 2.002 us | }
> 1) 2.784 us | }
> 1) 6.230 us | }
>
> The NMI code actually doesn't care about the exact tail value. It only
> needs to know if a fault happened (!= 0)

For now, changing the semantics of the function seems like a sure way to
fail in the future though.

> So check for in_nmi() in copy_user_tail and don't bother with the exact
> tail check. This way we save the extra ~2.7us.
>
> In theory we could also duplicate the whole copy_*_ path for cases
> where the caller doesn't care about the exact bytes. But that
> seems overkill for just this issue, and I'm not sure anyone
> else cares about how fast this is. The simpler check works
> as well for now.

So I don't get that code, but why not fix it in general? Taking two
faults seems silly.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/