Re: [rc4-amd64] RC4 optimized for AMD64

From: Marc Bevand
Date: Mon Nov 01 2004 - 18:43:40 EST


dean gaudet wrote:
|
| [...]
| ack... it's too early on a monday morning -- i misread the documentation.
| this ZF assumption is actually defined and portable... still kind of ugly.
| how much benefit do you see?

When "dec" is placed before "ror", throughput goes up by about 5%
on my test system (Opteron 244 rev C0). I don't find it "ugly"
because the optimization no intrusive at all (only 1 moved instruction).

Concerning the "dec / sub $1" case, it makes absolutely no difference
on the Opteron, I just used "dec" because the opcode is 3 bytes length
instead of 4.

--
Marc Bevand http://www.epita.fr/~bevand_m
Computer Science School EPITA - System, Network and Security Dept.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/