Re: [RFC PATCH] x86/asm/irq: Don't use POPF but STI

From: Ingo Molnar
Date: Tue Apr 21 2015 - 11:22:43 EST

Next message: Mauro Carvalho Chehab: "[GIT PULL for v4.1-rc1] media updates"
Previous message: Tejun Heo: "Re: [PATCH v1 4/6] moduleparam.h: add module_param_config_*() helpers"
In reply to: Borislav Petkov: "Re: [RFC PATCH] x86/asm/irq: Don't use POPF but STI"
Next in thread: Linus Torvalds: "Re: [RFC PATCH] x86/asm/irq: Don't use POPF but STI"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

* Borislav Petkov <bp@xxxxxxxxx> wrote:

> On Tue, Apr 21, 2015 at 02:45:58PM +0200, Ingo Molnar wrote:
> > From 6f01f6381e8293c360b7a89f516b8605e357d563 Mon Sep 17 00:00:00 2001
> > From: Ingo Molnar <mingo@xxxxxxxxxx>
> > Date: Tue, 21 Apr 2015 13:32:13 +0200
> > Subject: [PATCH] x86/asm/irq: Don't use POPF but STI
> >
> > So because the POPF instruction is slow and STI is faster on
> > essentially all x86 CPUs that matter, instead of:
> >
> > ffffffff81891848: 9d popfq
> >
> > we can do:
> >
> > ffffffff81661a2e: 41 f7 c4 00 02 00 00 test $0x200,%r12d
> > ffffffff81661a35: 74 01 je ffffffff81661a38 <snd_pcm_stream_unlock_irqrestore+0x28>
> > ffffffff81661a37: fb sti
> > ffffffff81661a38:
> >
> > This bloats the kernel a bit, by about 1K on the 64-bit defconfig:
> >
> > text data bss dec hex filename
> > 12258634 1812120 1085440 15156194 e743e2 vmlinux.before
> > 12259582 1812120 1085440 15157142 e74796 vmlinux.after
> >
> > the other cost is the extra branching, adding extra pressure to the
> > branch prediction hardware and also potential branch misses.
>
> Do we care? [...]

Only if it makes stuff faster.

> [...] After we enable interrupts, we'll most likely go somewhere
> cache "cold" anyway, so the branch misses will happen anyway.
>
> The question is, would the cost drop from POPF -> STI cover the
> increase in branch misses overhead?
>
> Hmm, interesting.

So there's a few places where the POPF is a STI in 100% of the cases.
It's probably a win there.

But my main worry would be sites that are 'multi use', such as locking
APIs - for example spin_unlock_irqrestore(): those tend to be called
from different code paths, and each one has a different IRQ flags
state.

For example scheduler wakeups done from irqs-off codepaths (it's very
common), or from irqs-on codepaths (that's very common as well). In
the former case we won't have a STI, in the latter case we will - and
both would hit a POPF at the end of the critical section. The
probability of a branch prediction miss is high in this case.

So the question is, is the POPF/STI performance difference higher than
the average cost of branch misses. If yes, then the change is probably
a win. If not, then it's probably a loss.

My gut feeling is that we should let the hardware do it, i.e. we
should continue to use POPF - but I can be convinced ...

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Mauro Carvalho Chehab: "[GIT PULL for v4.1-rc1] media updates"
Previous message: Tejun Heo: "Re: [PATCH v1 4/6] moduleparam.h: add module_param_config_*() helpers"
In reply to: Borislav Petkov: "Re: [RFC PATCH] x86/asm/irq: Don't use POPF but STI"
Next in thread: Linus Torvalds: "Re: [RFC PATCH] x86/asm/irq: Don't use POPF but STI"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]