Re: [PATCH] poly1305: generic C can be faster on chips with slow unaligned access

From: Jason A. Donenfeld
Date: Wed Nov 02 2016 - 17:25:10 EST

These architectures select HAVE_EFFICIENT_UNALIGNED_ACCESS:

s390 arm arm64 powerpc x86 x86_64

So, these will use the original old code.

The architectures that will thus use the new code are:

alpha arc avr32 blackfin c6x cris frv h7300 hexagon ia64 m32r m68k
metag microblaze mips mn10300 nios2 openrisc parisc score sh sparc
tile um unicore32 xtensa

Unfortunately, of these, the only machines I have access to are MIPS.
My SPARC access went cold a few years ago.

If you insist on a data-motivated approach approach, then I fear my
test of 1 out of 26 different architectures is woefully insufficient.
Does anybody else on the list have access to more hardware and is
interested in benchmarking?

If not, is there a reasonable way to decide on this by considering the
added complexity of code? Are we able to reason best and worst cases
of instruction latency vs unalignment stalls for most CPU designs?