Re: [PATCH] x86: Pack loops tightly as well
From: Borislav Petkov
Date: Fri Apr 10 2015 - 09:48:34 EST
On Fri, Apr 10, 2015 at 02:30:18PM +0200, Ingo Molnar wrote:
> And the final patch below also packs loops tightly:
> text data bss dec filename
> 12566391 1617840 1089536 15273767 vmlinux.align.16-byte
> 12224951 1617840 1089536 14932327 vmlinux.align.1-byte
> 11976567 1617840 1089536 14683943 vmlinux.align.1-byte.funcs-1-byte
> 11903735 1617840 1089536 14611111 vmlinux.align.1-byte.funcs-1-byte.loops-1-byte
> The total reduction is 5.5%.
> Now loop alignment is beneficial if:
> - a loop is cache-hot and its surroundings are not.
> Loop alignment is harmful if:
> - a loop is cache-cold
> - a loop's surroundings are cache-hot as well
> - two cache-hot loops are close to each other
> and I'd argue that the latter three harmful scenarios are much more
> common in the kernel. Similar arguments can be made for function
> alignment as well. (Jump target alignment is a bit different but I
> think the same conclusion holds.)
So I IMHO think the loop alignment is coupled to the fetch window size
and alignment. I'm looking at the AMD opt. manuals and both for fam 0x15
and 0x16 say that hot loops should be 32-byte aligned due to 32-byte
aligned fetch window in each cycle.
So if we have hot loops, we probably want them 32-byte aligned (I don't
know what that number on Intel is, need to look).
Family 0x16 says, in addition, that if you have branches in those loops,
the first two branches in a cacheline can be processed in a cycle when
they're in the branch predictor. And so to guarantee that you should
align your loop start to a cacheline.
And this all depends on the uarch so I can imagine optimizing for the
one would harm the other.
Looks like a long project of experimenting and running perf counters :-)
ECO tip #101: Trim your mails when you reply.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/