Re: x86 copy performance regression

From: Eric Dumazet
Date: Fri May 26 2023 - 14:55:40 EST


On Fri, May 26, 2023 at 8:33 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Fri, May 26, 2023 at 10:51 AM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
> >
> > Hmmm
> >
> > [ 25.532236] RIP: 0010:0xffffffffa5a85134
> > [ 25.536173] Code: Unable to access opcode bytes at 0xffffffffa5a8510a.
>
> This was the other reason I really didn't want to use alternatives on
> the conditional branch instructions. The relocations are really not
> very natural, and we have odd rules for those things. So I suspect our
> instruction rewriting simply gets this wrong, because that's such a
> nasty pattern.
>
> I really wanted my "just hardcode the instruction bytes" to work. Not
> only did it get me the small 2-byte conditional jump, it meant that
> there was no relocation on it. But objtool really hates not
> understanding what the alternatives code does.
>
> Which is fair enough, but it's frustrating here when it only results
> in more problems.
>
> Anyway, I guess *this* avoids all issues. It creates an extra jump to
> a jump for the case where the CPU doesn't have ERMS, but I guess we
> don't really care about those CPUs anyway.
>
> And it avoids all the "alternative instructions have relocations"
> issues. And it creates all small two-byte jumps, and the "rep movsb"
> fits exactly on that same 2 bytes too. Which I guess all argues for
> this being what I should have started with.
>
> This time it *really* works.
>

Indeed, this one is working and fixes the issue for me, thanks a lot !

New numbers look similar to 6.3 ones.

Tested-by: Eric Dumazet <edumazet@xxxxxxxxxx>

Performance counter stats for 'taskset 02 ./tcp_mmap -H 2002:a05:6608:297::':

2,833.29 msec task-clock # 0.970
CPUs utilized
1,065 context-switches # 375.888
/sec
1 cpu-migrations # 0.353
/sec
128 page-faults # 45.177
/sec
10,297,389,329 cycles # 3.634
GHz
7,213,189,594 instructions # 0.70
insn per cycle
1,220,821,121 branches # 430.884
M/sec
10,430,907 branch-misses # 0.85% of
all branches

2.921180547 seconds time elapsed

0.005304000 seconds user
2.478561000 seconds sys