Re: [PATCH RFC] Rough draft document on merging and rebasing

From: Theodore Ts'o
Date: Sat Jun 01 2019 - 11:46:43 EST


On Thu, May 30, 2019 at 01:53:17PM -0600, Jonathan Corbet wrote:
> +Rebasing
> +========
> +
> +"Rebasing" is the process of changing the history of a series of commits
> +within a repository. At its simplest, a rebase could change the starting
> +point of a patch series from one point to another. Other uses include
> +fixing (or deleting) broken commits, adding tags to commits, or changing
> +the order in which commits are applied. Used properly, rebasing can yield
> +a cleaner and clearer development history; used improperly, it can obscure
> +that history and introduce bugs.

Rebasing is being used in two senses here. The first is where the
diffs don't change (much), but the starting point of the changes is
being changed. The second is where a broken commit is dropped, adding
a signed-off-by, etc.

Both have the property that they can used for good or ill, but the
details when this is an issue can change a bit. More below...

> +There are a few rules of thumb that can help developers to avoid the worst
> +perils of rebasing:
> +
> + - History that has been exposed to the world beyond your private system
> + should not be rebased. Others may have pulled a copy of your tree and
> + built on it; rebasing your tree will create pain for them. If work is
> + in need of rebasing, that is usually a sign that it is not yet ready to
> + be committed to a public repository.

That seems to be a bit too categorical. It's been recommended, and
some people do, push patches to a branch so the zero-day bot will pick
it up and test for potential problems. And broken commits *do* get
dropped from candidate stable kernel releases. And, of course,
there's linux-next, which is constantly getting rebased.

And there have been people who have pushed out RFC patche series onto
a git branch, and publicized it on LKML for the convenience of
reviewers. (Perhaps because there is a patch which is so big it
exceeds the LKML size restrictions.)

I think it's more about whether people know that a branch is
considered unstable from a historical perspective or not. No one
builds on top of linux-next because, well, that would be silly.

Finally, I'm bit concerned about anything which states absolutes,
because there are people who tend to be real stickler for the rules,
and if they see something stated in absolute terms, they fail to
understand that there are exceptions that are well understood, and in
use for years before the existence of the document which is trying to
codify best practices.

> + - Realize the rebasing a patch series changes the environment in which it
> + was developed and, likely, invalidates much of the testing that was
> + done. A rebased patch series should, as a general rule, be treated like
> + new code and retested from the beginning.

In this paragraph, "rebasing" is being used in the change the base
commit on top of which a series of changes were based upon. And it's
what most people think of when they see the word "rebase".

However "git rebase" is also used to rewrite git history when dropping
a bad commit, or fixing things so the branch is bisectable, etc. But
in the definitions above, the word "rebase" was defined to include
both "changing the base of a patch stack", and "rewriting git
history". I wonder if that helps more than it hinders understanding.

At least in my mind, "rebasing" is much more often the wrong thing,
where as "rewriting git history" for a leaf repository can be more
often justifable. And of course, if someone is working on a feature
for which the review and development cycle takes several kernel
releases, I'd claim that "rebasing" in that case always makes sense,
even if they *have* exposed their patch series via git for review /
commenting purposes.

Regards,

- Ted