Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info
From: Fengguang Wu
Date: Tue Feb 23 2016 - 21:30:33 EST
Hi Eric,
On Tue, Feb 23, 2016 at 01:56:07PM -0600, Eric W. Biederman wrote:
>
> Fengguag Wu, Xiaolong Ye, have you attempted to use the truncated
> sha1 of the file the patch applies to? Git already places a file sha1
> at the top of a patch. See the index line?
>
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index eccd925c6e82..3c3f8172c734 100644
Yes we've evaluated to make use of that index. The conclusion is,
it helps make a better guess, however it's still a guessing work
and far from perfect.
A simple accounting shows only 1/5 files will be changed between
two major kernel releases:
wfg /c/linux% git ls-files |wc -l
52915
wfg /c/linux% git diff --name-only v4.3 v4.4|wc -l
10606
That means a huge number candidate base tree IDs matching the given
blob IDs.
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
>
> As I understand it you are aiming for making a good guess what the patch
> or patches apply to, having a set of file hashes looks like it would
> give you that.
>
> All it should take is to iterate over a patchset and for each file in
> the patchset capture the first file hash. Then in the smallish set of
> maintainer trees see if that set of file hashes matches any of their
> recent commits. You should be able to prune the set of possible
> maintainer trees even more by looking at the mailling list or lists
> the patch was submitted to.
We actually start with the above thinking half year ago. Yes it'll
help narrow down the list of candidate maintainer trees. And the
chance will be increased if the patchset modifies multiple files,
and the fact some files are modified more frequently than the others.
However it's still fundamentally a guess work. The best choice is to
ask for explicit "base tree ID".
> Before we talk about adding anything more I think we need a clear
> picture of what you have tried with what already exists. A decade ago
> part of the problem was that not everyone used git. At best it will
> take a little while before everyone upgrades to a version of git diff
> containing your changes, and if possibly even longer if they have to
> start specifying an additional option when a diff is generated.
That's a good concern. It may take year long delay before reaching
reasonable population of the new feature.
To speedup the process, we could advocate the new git option in 0day
robot's error reports. Since we catch errors in ~10 LKML patches each
day, within months most kernel developers should get the tips on how
to set it up and enable the feature by default.
Thanks,
Fengguang