Re: source dependencies cleanup?

Paul Flinders (ptf@datasci.co.uk)
Wed, 4 Dec 1996 13:35:12 -0000


> From: Peter T. Breuer <ptb@oboe.it.uc3m.es>

> You are right and mkdep is WRONG about genhd.c.

I'm glad we're beginning to agree :-)

> ATTENTION folks! There seems to be a bug in mkdep that needs chasing
> here. The implementation does not follow the spec.

I couldn't find a "spec" for mkdep. Is there one in the source tree, if so
where (I must have missed it).

> It is supposed to form the transitive closure of the #include'd files and here
> we have a case where it does not. Don't know why yet.

As far as I can see mkdep makes no attempt to follow nested includes
and that this is a design decision not a bug (that global "filename" variable
for instance). This is supported by the following comment at the top of
depend.awk which I presume is mkdep's predecessor

# This is an awk script which does dependencies. We do NOT want it to
# recursively follow #include directives.

I think that not following nested includes is wrong as I've already pointed
out in the discussion on genhd. Unfortunately in a reasonably complex
program (such as the linux kernel) producing dependancy lists by
expanding each source file fully and then spitting out what headers were
included is slow (for N source files and M headers I'd estimate that you
process approximately 0.5*N*M*(average file length) of text - it's
worse for C++ as you tend to get more "highly included" headers.

You can speed this up in a number of ways. You can avoid parsing
most of the code and just search for ^#include. This is faster because
it throws away most lines of input but gets through the same amount
of text. You can get a very good speed up by remembering for each
header what it includes and then re-using a "partial dependancy", this
speeds up the process a *lot* because you only get through about
(N+M)*(average file length) of text.

mkdep appears to achieve speed-up by only including the "first level"
of include file in the dependancy output.

However generating the dependancies as part of the compile makes
more sense as you have to expand everything anyway and the additional
housekeeping is very small.

>
> > I think this example supports my earlier statement that mkdep is fast
> > because it does half a job.
>
> No - at the moment you have (just :-) well done!) found a bug in the mkdep
> implementation. The math stands and Il get back to you on that.

Your math as originally presented seems to be based on the flawed
assumption that using -MD generated dependancies requires all source
files to be compiled to get their dependancies. This is not true - you don't
*need* the dependancies if you aren't going to compile a given source
file and later if you do the fact that you need to build the object file is enough
to trigger dependancy generation.

If we assume that a correctly implemented "mkdep" and a correctly
implemented -MD solution cause the same files to be re-compiled
then you have a point because if Y is the (total) compilation time and X
is the fraction added for -MD we have

mkdep = %files to re-build * Y

-MD = %files to re-build * Y + %files to re-build * X * Y

However as I've already pointed out X is not easily measurable as it disappears
into the noise (ie rather less than 1%). Also the total time for a build, from a
clean tree where Dp is the time for the mkdep phase we have

mkdep = Dp + Y
-MD = X + X*Y

we loose only when X*Y * (average No of compiles between make deps)
exceeds Dp. In fact this isn't really a fair comparison as using -MD is
equivalent to doing make depend; make for each build.

Measured over a couple of builds from a clean source tree (many "end users")
we get an improvement in performance (no "make depend" pass or a much
reduced one).

Measured over repeated builds we have better functionality (complete
depenancies which are kept automatically up-to-date) at a slight
cumulative performance penalty - which is probably low enough that
it would require that you to do 20 or 30 builds between "make depend"s
to actually loose.

Regards

Paul.