Re: gcc feature request / RFC: extra clobbered regs

From: Vladimir Makarov
Date: Wed Jul 01 2015 - 13:57:54 EST




On 07/01/2015 11:27 AM, Andy Lutomirski wrote:
On Wed, Jul 1, 2015 at 8:23 AM, Vladimir Makarov <vmakarov@xxxxxxxxxx> wrote:

On 06/30/2015 05:37 PM, Jakub Jelinek wrote:
On Tue, Jun 30, 2015 at 02:22:33PM -0700, Andy Lutomirski wrote:
I'm working on a massive set of cleanups to Linux's syscall handling.
We currently have a nasty optimization in which we don't save rbx,
rbp, r12, r13, r14, and r15 on x86_64 before calling C functions.
This works, but it makes the code a huge mess. I'd rather save all
regs in asm and then call C code.

Unfortunately, this will add five cycles (on SNB) to one of the
hottest paths in the kernel. To counteract it, I have a gcc feature
request that might not be all that crazy. When writing C functions
intended to be called from asm, what if we could do:

__attribute__((extra_clobber("rbx", "rbp", "r12", "r13", "r14",
"r15"))) void func(void);

This will save enough pushes and pops that it could easily give us our
five cycles back and then some. It's also easy to be compatible with
old GCC versions -- we could just omit the attribute, since preserving
a register is always safe.

Thoughts? Is this totally crazy? Is it easy to implement?

(I'm not necessarily suggesting that we do this for the syscall bodies
themselves. I want to do it for the entry and exit helpers, so we'd
still lose the five cycles in the full fast-path case, but we'd do
better in the slower paths, and the slower paths are becoming
increasingly important in real workloads.)
GCC already supports -ffixed-REG, -fcall-used-REG and -fcall-saved-REG
options, which allow to tweak the calling conventions; but it is per
translation unit right now. It isn't clear which of these options
you mean with the extra_clobber.
I assume you are looking for a possibility to change this to be
per-function, with caller with a different calling convention having to
adjust for different ABI callee. To some extent, recent GCC versions
do that automatically with -fipa-ra already - if some call used registers
are not clobbered by some call and the caller can analyze that callee,
it can stick values in such registers across the call.
I'd say the most natural API for this would be to allow
f{fixed,call-{used,saved}}-REG in target attribute.


One consequence of frequent changing calling convention per function or
register usage could be GCC slowdown. RA calculates too many data and it
requires a lot of time to recalculate them after something in the register
usage convention is changed.
Do you mean that RA precalculates things based on the calling
convention and saves it across functions?
RA calculates a lot info (register classes, class x class relations etc) based on register usage convention (fixed regs, call used registers etc). If register usage convention is not changed from previous function compilation, RA reuses the info. Otherwise, RA recalculates it.
Hmm. I don't think this
would be a big problem in my intended use case -- there would only be
a handful of functions using this extension, and they'd have very few
non-asm callers.
Good. I guess it will be rarely used and people will tolerate some extra compilation time.
Another consequence would be that RA fails generate the code in some cases
and even worse the failure might depend on version of GCC (I already saw PRs
where RA worked for an asm in one GCC version because a pseudo was changed
by equivalent constant and failed in another GCC version where it did not
happen).

Would this be a problem generating code for a function with extra
"used" regs or just a problem generating code to call such a function.
I imagine that, in the former case, RA's job would be easier, not
harder, since there would be more registers to work with.
Sorry, I meant that the problem will be mostly when the attributes describe more fixed regs. If you describe more clobbered regs, they still can be used for allocator which can spill/restore them (around calls) when they can not be used. Still i think there will be some rare and complicated cases where even describing only clobbered regs can make RA fails in a function calling the function with additional clobbered regs.
In
practice, though, I think it would just end up changing the prologue
and epilogue.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/