Re: source dependencies cleanup?

Paul Flinders (ptf@datasci.co.uk)
Wed, 4 Dec 1996 19:02:05 -0000


> From: Peter T. Breuer <ptb@oboe.it.uc3m.es>
> The spec is: compute (at least) the dependencies of all given files.
> Oh oh! I am dumb! It only needs to compute the primary dependencies!
> Make itself will do the recursive bit! So we are fussing about nothing
> here at least.

Yes, I've just noticed that myself. It appears that we both made the
same incorrect assumption about what mkdep is supposed to do.

> > depend.awk which I presume is mkdep's predecessor
>
> Mkdep.c is a C program. There is no preprocessor. Depend.awk is
> the original version of mkdep that preceded the perl script that
> (briefly and optionally) preceded the C version.
Yes - that's what I said (you slightly mis-read my statement)

> It appears to be correct in principle. Are you sure that it is wrong
> here? I just decided that I was wrong in crying wolf on the differences
> between mkdep and gcc dependency output on genhd.c ...

OK mkdep's apparent skimpy output is a red herring. Linus took
the second suggestion (only process each header+source file
once). When I did that I generated output which had the full list
for each .o file (.c file plus all headers), I'd gotten locked into thinking
that was how the output should look.

> > Your math as originally presented seems to be based on the flawed
> > assumption that using -MD generated dependancies requires all source
> > files to be compiled to get their dependancies.
>
> Yes. The assumption is not flawed. It does so require.
>
See below - I don't agree that you need to do all the files.

> > This is not true -
>
> It is true.
"And my dad's bigger than your dad" :-) (sorry - can't remenber what
that 'tis/'taint bit referred to)

>
> > you don't
> > *need* the dependancies if you aren't going to compile a given source
>
> That is altogether _another_ question. Define "need"!
>
> I tell you that I "need" (leaving aside what it means) all the
> dependency information for the whole kernel at the moment, even though I
> am not presently using all of it. I may be wrong on that, but we can
> argue about it once we decide what "need" means. There is a fair chance
> that you are right and that we can get away without all the dependency
> info, but I need a proof of that, and I need a statement of what the
> words mean before we can decide the truth,
>
> > file and later if you do the fact that you need to build the object file is enough
> > to trigger dependancy generation.
>
> This is where I think it is by now too late to find out what the
> dependencies really are. This is the point we have to examine. It
> could be OK.
>
> > If we assume that a correctly implemented "mkdep" and a correctly
> > implemented -MD solution cause the same files to be re-compiled
> > then you have a point because if Y is the (total) compilation time and X
> > is the fraction added for -MD we have
>
> Well, I was taking X to be the _absolute_ time added for -MD of all files,
> but OK ...
>
> > mkdep = %files to re-build * Y
>
> This appears to be the time required to rebuild all files multiplied by
> the percentage of files that are actually rebuilt. That should be the
> time required to actually do the compilation of the kernel, not
> considering the time used to calculate the dependencies. But that is not
> what I calculated. I calculated
>
> i] Time required to calculate ALL dependencies and make the kernel.
>
> You have calculated
>
> ii] Time required to make kernel and make SOME dependencies.
> (in the case of your mkdep calc above, SOME=NONE)
>
> For the gcc -MD method, these are not the same thing either. Your point is
> therefore that we only need to make some dependencies, and I remain to be
> convinced about that. It may be so.
>
> > -MD = %files to re-build * Y + %files to re-build * X * Y
>
> This is time to make kernel + time to recompute dependencies for the
> files that were recompiled. I.e. ii] above.
>
> What I calculated was time to compute all dependencies plus time to
> make kernel. That goes as follows:
>
> i] with mkdep (Y is time to recompile all files)
>
> one pass with mkdep costs just about nothing in time.
> + %files to rebuild according to mkdep * Y
>
> I think I suggested that the percentage here was 60% (I can remember)
> because mkdep in principle should give an overestimate of the set of
> dependencies because it does not figure out #if's (not sure if gcc does
> either!) and that probably the "right" number of files to rebuild is 50%.
>
> So we get
>
> 0 + 0.6 * Y = 0.6 * Y
>
> ii] with gcc -MD. Again Y is time to recompile all files.
>
> one pass with gcc -MD over ALL files for dependencies and .o
> + extra cost of dependency building as we go
>
> I assume that the extra cost is nil. The problem here is taht we have
> to compile all files. The answer here is
>
> Y = 1.0 * Y
>
> iii] with a prior pass by gcc -M.
>
> %time per file scanning for deps in a first pass * Y
> +%files to rebuild according to gcc *Y
>
> here I suggested that the cost was 20% extra per file on the extra
> pass to find and chase its dependencies, but that gcc gets the more
> exact answer of 50% to rebuild
>
> 0.2 * Y + 0.5 * Y
> = 0.7 * Y
>
>
> So all that is above board. What we have to settle is what "need" means
> in terms of dependencies. Clearly, in order to do the task that is
> presently being done - calc .o's and ALL deps - mkdep is better.
>
> > However as I've already pointed out X is not easily measurable as it disappears
> > into the noise (ie rather less than 1%). Also the total time for a build, from a
>
> That OK. I agree. It's nothing.
>
> >
> > Measured over repeated builds we have better functionality (complete
> > depenancies which are kept automatically up-to-date) at a slight
>
> I disagree - at least I don't agree. Prove to me that you don't "need"
> to calculate all dependencies at every upgrade and I'll believe you!

I'll try to deal with all of the above points together - especially the last.

I'm afraid that I have to disagree that the time taken for "make depend"
is short enough to be discounted - it certainly isn't on the machines
that I have at work.

I'm concerned with a number of things
1) the time taken for a full build from a clean source tree - this
is all that many people do
2) The time taken for a build after a configuration change
3) the time taken for a development build after modifying some
files
4) The correctness of the dependancy info, neither too inclusive
nor too exclusive so that files which need to be re-compiled
after some change are and those which do not aren't.
5) the ease of keeping things up-to-date

Using -MD should shorten 1, shorten 3 for most people (those who
do a make depend more frequently than every 20 or so builds) and
improve 4 & 5. Shortening 3 requires finer-grained configuration files.

The times that I quoted tried to be 3, assuming that most people
don't do a make depend before each compile (if they do -MD is clearly
faster). I don't think the fact that you give the time for all dependancies
whereas my times are only for some is important. We both include
times for the amount of dependancy calculation which we believe
to be required for correct re-builds.

I'm measuring "need" in terms of 4

For your times

i) I disagree with the 0 for "make depend" - on my machines it appears
to be 5-10% of the time needed for a full build. Therefore the time
should be 0.65Y - 0.7Y

ii) I disagree with the need to re-build all files. If 50% need to be re-built
then 50% will be so this time should be 0.5 Y

iii) I'm not suggesting that we do a separate pass based on gcc -M
I think you'll find the performance worse than you suggest.

I'm glad that you're beginning to see that you don't need to have full
dependancies before starting to build things. I'll try to explain again
starting with a virgin source tree, just after a make *config

We know we want to build the kernel, we (should) have a list of
object files which go to make up the kernel and, for each object
we know the corresponding source file. The fact that we don't know
the header files included by the source file is immaterial - the fact
that the object file doesn't exist means that we must compile it.

The act of compilation creates a dependancy file for each object
which records *all* of the headers and the source files which were
part of the build.

We can now make a number of different changes.

a) We could edit some source or header file (without changing
the dependancies or needing new object files).

In this case make will re-compile those files which need to
be compiled according to the current dependancies.

Of course the same happens with the current set-up.

b) We could edit a source file or header such that we introduce
a new dependancy. In this case we *must* have edited one
of the currently listed dependancies so the object file will be
out of date according to current knowledge. The re-compilation
will update the dependancies to include the new header.

Dependancies generated by the current mkdep *may* cause
files to be re-compiled unnecessarily in this situation.

Whilst the file will be re-compiled in the existing set-up the
new dependancy will not be recorded. If I subsequently
edit the newly added header without re-doing the make depend
the source file will not be re-sompiled as it should be

c) We could make a configuration change which means that
we must include new object files into the kernel. The fact that
we don't have dependancies for the new object file is not
important, the fact that it doesn't exist is sufficient and the
compilation will cause a new dependancy file to be generated.

In any case the fact that config.h and autoconf.h are touched
means that half of the kernel will be re-compiled whether it
needed to be or not (but that isn't a mkdep related problem)

d) We could edit a makefile to change the compile flags such
that a dependancy is wrong (by changing/adding -D, -U or
-I flags)

OK, well this gets harder. Really the command line used
to compile a source file is one of the dependancies for
the object file and should be treated as such. Some
build tools allow for this (the .KEEPSTATE stuff in Sun's
make, or the automatic tracking in the ClearCase make).
However GNU make doesn't do this (unless 3.75 does - I'm
still on 3.74).

Fortunately we have edited a file so every object should
depend on the Makefile(s) which could influence their
command lines.

I don't think that the current set-up deals with this does it?

e) We could supply command line flags to make to achieve
the above without editing any files.

OK, this is a poser without support for remembering command
lines by make. However the files won't be re-compiled in the
current set-up anyway so it's not as if we loose anything.

f) I edit a header which is *potentially* included by a source
file, then change the compile flags so that the header is
actually included by supplying arguments to make.

OK I conceed this point, however we're being perverse here
aren't we (unless anyone will admit to doing this sort of thing
frequently).

I count that as 1-1 for features/safety, however I think that using
-MD fails only under rather odd conditions whereas the current
system cna fail under rather common conditions. Of course this
is just my opinion. If you can think of any other ways to cause
re-compilation let me know.

Using -MD has better performance than a separate make depend
pass unless you do "make depend" very infrequently even if using
it only causes the re-compilation of the same files. If your 50% vs
60% estimate is correct then it will be a win anyway.

I've just read the comment in the GB vs MB thread about picking
battles carefully. I didn't/don't want this to become a battle. My
original question was triggered by the fact that mkdep appeared
to be causing problems in some circumstances and since I knew
of (what I believed to be) a good and reliable alternative I wondered
whether its use had been considered. I *think* that we could knock
5, maybe 10% off compile times which may not be much for those
who have machines capable of compiling a kernel in 2 or 3 minutes
but might be welcome for others who are compiling on 486s with
plain IDE drives.

I think I'll make this my first kernel project, the Linux community
(or, rather, Linus) may then accept or reject it depending on whether
it is percieved to be of benefit. However the current system is
a) fairly good and b) quite complex so I will take my time to
understand it further. Also I don't know when I will have enough spare
time to have a go.

TTFN

Paul