Re: RFC: Petition Intel/AMD to add POPF_IF insn

From: Denys Vlasenko
Date: Thu Aug 18 2016 - 21:07:17 EST


On 08/18/2016 07:24 PM, Linus Torvalds wrote:
That said, your numbers really aren't very convincing. If popf really
is just 10 cycles on modern Intel hardware, it's already fast enough
that I really don't think it matters.

It's 20 cycles. I was wrong in my email, I forgot that the insn count
also counts "push %ebx" insns.

Since I already made a mistake, let me double-check.

200 million iterations of this loop execute under 17 seconds:

400100: b8 00 c2 eb 0b mov $0xbebc200,%eax # 1000*1000*1000
400105: 9c pushfq
400106: 5b pop %rbx
400107: 90 nop
....
0000000000400140 <loop>:
400140: 53 push %rbx
400141: 9d popfq
400142: 53 push %rbx
400143: 9d popfq
400144: 53 push %rbx
400145: 9d popfq
400146: 53 push %rbx
400147: 9d popfq
400148: 53 push %rbx
400149: 9d popfq
40014a: 53 push %rbx
40014b: 9d popfq
40014c: 53 push %rbx
40014d: 9d popfq
40014e: 53 push %rbx
40014f: 9d popfq
400150: 53 push %rbx
400151: 9d popfq
400152: 53 push %rbx
400153: 9d popfq
400154: 53 push %rbx
400155: 9d popfq
400156: 53 push %rbx
400157: 9d popfq
400158: 53 push %rbx
400159: 9d popfq
40015a: 53 push %rbx
40015b: 9d popfq
40015c: ff c8 dec %eax
40015e: 75 e0 jne 400140 <loop>

The loop is exactly 32 bytes, aligned.
There are 14 POPFs. Other insns are very fast.

No perf, just "time taskset 1 ./test".
My CPU frequency hovers around 3500 MHz when loaded.

17 seconds is 17*3500 million cycles.
17*3500 million cycles / 200*14 million cycles = 21.25

Thus, one POPF in CPL3 is ~20 cycles on Skylake.