ldimm64 is not only used for loading function addresses, and
the NOPs added for padding are impacting performance, so avoid
them when not necessary.
On QEMU mac99, with the patch:
test_bpf: #829 ALU64_MOV_K: all immediate value magnitudes jited:1 167436810 PASS
test_bpf: #831 ALU64_OR_K: all immediate value magnitudes jited:1 170702940 PASS
Without the patch:
test_bpf: #829 ALU64_MOV_K: all immediate value magnitudes jited:1 173012360 PASS
test_bpf: #831 ALU64_OR_K: all immediate value magnitudes jited:1 176424090 PASS
That's a 3.5% performance improvement.