Re: RFC: Petition Intel/AMD to add POPF_IF insn

From: Denys Vlasenko
Date: Wed Aug 31 2016 - 07:12:25 EST


On 08/19/2016 12:54 PM, Paolo Bonzini wrote:
On 18/08/2016 19:24, Linus Torvalds wrote:
I didn't do CPL0 tests yet. Realized that cli/sti can be tested in userspace
if we set iopl(3) first.
Yes, but it might not be the same. So the timings could be very
different from a cpl0 case.

FWIW I recently measured around 20 cycles for a popf as well on
Haswell-EP and CPL=0 (that was for commit f2485b3e0c6c, "KVM: x86: use
guest_exit_irqoff", 2016-07-01).

Thanks for confirmation.

I revisited benchmarking of the

if (flags & X86_EFLAGS_IF)
native_irq_enable();

patch. In "make -j20" kernel compiles on a 8-way (HT) CPU, it shows some ~5 second
improvement during ~16 minute compile. That's 0.5% speedup. It's ok, but not
something to bee too excited.

80 e6 02 and $0x2,%dh
74 01 je ffffffff810101ae <intel_pt_handle_vmx+0x3e>
fb sti

41 f6 86 91 00 00 00 02 testb $0x2,0x91(%r14)
74 01 je ffffffff81013ce7 <math_error+0x77>
fb sti

f6 83 91 00 00 00 02 testb $0x2,0x91(%rbx)
74 01 je ffffffff81013efa <do_int3+0xba>
fb sti

41 f7 c4 00 02 00 00 test $0x200,%r12d
74 01 je ffffffff8101615d <oops_end+0x5d>
fb sti

Here we trade 20-cycle POPF for either 4-cycle STI, or a branch (which is either
~1 cycle if predicted, or ~20 cycles if mispredicted). The disassembly of
vmlinux shows that gcc generates these asm patterns:

I still think a dedicated instruction for a conditional STI is worth asking for.

Along the lines of "If bit 9 in the r/m argument is set, then STI, else nothing".

What do people from CPU companies say?