well, i assume you mean in general; as the current gccs have obvious
problems in this area (eg nested inlines). Even in theory, a builtin
potentially can be more efficient than anything handwritten in C
(because it has more concrete info, as opposed to the macro/inline/asm
construct, which forces the compiler to second guess the user, and
can prevent it from knowing whether some optimization is legal or not)
> There is
> nothing 'magic' about structure copies. GCC might as well assume what
> memcpy does, and might optimize it away if it knows better. (eg. if all
> structure fields are initialized shortly after copied.)
> > [...] , simply because the compiler knows more,
> > and can use that knowledge in ways that are not possible with "normal"
> > inlined code. Think dummy store elimination (eg: there's a lot of code
> > that first clears a structure only to immediately initialize it element
> > by element. Often the copy/clear could be eliminated partially or even
> > completely). Sometimes it would make sense to merge several mem operations
> > into one. etc. Right now i can't see a clean a way to provide alternative
> > implementations of the builtins that would not be very compiler (version)
> > specific and not loose any useful info (ie prevent optimizations).
> > Being able to override builtins would still be useful, eg while
> > waiting for the compiler to catch up.
[the above might not have been as clear as it could have been]
What you initially suggested was "to let users override GCC's internal
memcpy/etc. functions" and i had assumed what you actually wanted was
replacing not memcpy() itself, but the lower level code generated by
the compiler. That was what the context suggested and that is how i
interpret the below comment too -- and this is what i can't see a clean
solution for.
> no. The compiler can assume that our memcpy implementation is 'correct'
> and has no side-effects, in that case it's free to eliminate dummy stores.
Since the compiler would have to retain some knowledge about the
functions anyway (in this case it would actually have to know everything),
and only allow for source level modifications of the emitted instructions,
it seems you're suggesting to allow extending the compiler by the enduser.
Yes, that would be cool. No, i can't see this happening anytime soon,
it's not trivial. It'd likely be (very) compiler version specific.
Also mixing internal compiler knowledge with bits provided by the user
simply isn't generic enough. It may work for memcpy, memset. But what
about if i want an ip checksum? page clear bypassing the cache? etc.
Making the interface generic enough to be able to write any builtin
from scratch would make much sense. But then the builtins could be
hidden in some internal gcc header file (and would be much simplier
to modify, one could multiple sets of tuned versions etc). This is
however clearly compiler land.
[what this has to do with linux-kernel? well, the mentioned code
sequences exist in the kernel, in many cases the stores are
completely useless. Removing them wouldn't work in a distributed
enviroment (somebody might extend a structure and not update all
the initializations/copies) so the only reasonable solution seems
a smarter compiler. Hence rewriting everything as asm inlines and
reducing the compilers opportunities for optimization does not
look like the ideal direction]
> (ok.)
[just to avoid any misunderstanings -- my above comments have nothing to
do with the cld-patch, the only issues wrt to that were the few minor
details that has already been addressed. I hope it gets applied, at
least i could then drop my (much more primitive) version thereof :) ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/