Re: [PATCH v3] Add a document on rebasing and merging

From: Dmitry Vyukov
Date: Tue Jun 25 2019 - 01:35:41 EST


On Fri, Jun 14, 2019 at 4:25 PM Jonathan Corbet <corbet@xxxxxxx> wrote:
>
> On Fri, 14 Jun 2019 11:59:03 +0200
> Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
>
> > I will appreciate if you elaborate a bit on this "scale of the
> > project". I wondered about reasons for having the current hierarchy of
> > trees and complex merging for a while, but wasn't able to find any
> > rationale. What exactly scale do you mean? I know a number of projects
> > that are comparable to Linux kernel, with the largest being 2 orders
> > of magnitude larger than kernel both in terms of code size and rate of
> > change, that use single tree and linear history.
>
> I'm not sure what projects you're talking about, so it's hard to compare.
>
> During the 5.2 merge window, Linus did 209 pulls, bringing in just over
> 12,000 changesets, from on the order of 1600 developers. Even if, at the
> beginning of the window, each of those pulls was set up to be a
> fast-forward, they would no longer be positioned that way once the first
> pull was done.
>
> Are you really saying that subsystem maintainers should be continuously
> rebasing their trees to avoid merges at the top level? Do you see how
> much work that would take, how badly it would obscure the development
> history, and how many bugs it would introduce? Or perhaps I misunderstood
> what you're arguing for?


I mean projects like Chromium which seems to be comparable to kernel
in code size/rate of change. LLVM, Android are several times smaller,
but on the other hand has hundreds times less trees (1). And in
particular large monorepos in companies like Google, Facebook,
Microsoft. E.g. the Google codebase sees the v5.2 number of changesets
in few hours. Although, it's not apples-to-apples with the kernel but
shows that scale per-se is not a requirement for multiple
trees/non-linear history.
So for the kernel it must a combination of scale + something else (in
the process, ownership model, ...). I am trying to understand what is
that something else, how inherent it is and what would degrade if
kernel switches to single tree/linear history. It would obviously
require some adjustments to other parts of the process as well, e.g.
you asked what maintainers do with their trees but if there is a
single tree, they don't have a tree at all. In most other scalable
processes that I am aware of, as much work as possible is pushed down
to individual contributors and they do any required rebasing. The
closest analog of maintainers only do review and approval. The idea is
to remove bottlenecks and distribute process as much as possible to
increase scalability. I heard about "maintainer scalability" in the
context of the kernel process multiple times.