I'm surprised... It's my recollection that unaligned data is far slower
than cache misses. I guess accessing byte-aligned bytes isn't that bad,
though. Still I'd be very interested to see statistics on different
computers, and (if the structures aren't specific to one architechture -
can't check just now. If they are, ignore this;) most importantly
architechtures. Which is the unfortunate point in optimizations like
this; they're kinda architechture-dependent.
But if you're going to optimize for special cases, see the "Optimization
Manuals" on Intel's website - they give good insight into the cache- and
burst-loading sequences on Intel architechtures. I would, also, try to
profile with int's instead of char's to see if it's possible to find an
even faster combination between cache-line use and misalignment costs.
But then, I don't have the references in question handy to say if that's
supposed to have any effect, either ;)
-Donwulff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/