Kernel development process (was: [PATCH] fs: ratelimit __find_get_block_slow() failure message.)

From: Dmitry Vyukov
Date: Tue Jan 22 2019 - 10:28:13 EST


On Mon, Jan 21, 2019 at 9:37 AM Jan Kara <jack@xxxxxxx> wrote:
>
> On Thu 17-01-19 14:18:56, Dmitry Vyukov wrote:
> > On Wed, Jan 16, 2019 at 5:28 PM Greg Kroah-Hartman
> > <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > On Wed, Jan 16, 2019 at 12:48:41PM +0100, Dmitry Vyukov wrote:
> > > > On Wed, Jan 16, 2019 at 12:03 PM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> > > > I wanted to provide a hash/link to this commit but, wait, you want to
> > > > say that this patch for a security bugs was mailed, recorded by
> > > > patchwork, acked by subsystem developer and then dropped on the floor
> > > > for 3+ years? Doh!
> > > >
> > > > https://lore.kernel.org/patchwork/patch/599779/
> > > >
> > > > There are known ways how to make this not a thing at all. Like open
> > > > pull requests on github:
> > > > https://github.com/google/syzkaller/pulls
> > > > or, some projects even do own dashboard for this:
> > > > https://dev.golang.org/reviews
> > > > because this is important. Especially for new contributors, drive-by
> > > > improvements, good samaritan fixes, etc.
> > > >
> > > > Another example: a bug-fixing patch was lost for 2 years:
> > > > "Two years ago ;) I don't understand why there were ignored"
> > > > https://www.spinics.net/lists/linux-mm/msg161351.html
> > > >
> > > > Another example: a patch is applied to a subsystem tree and then lost
> > > > for 6 months:
> > > > https://patchwork.kernel.org/patch/10339089/
> > >
> > > I don't understand the issue here. Are you saying that sometimes
> > > patches that have been submitted get dropped? Yes, that's known, it is
> > > up to the submitter to verify and ensure that the patch is applied.
> > > Given our rate of change and the large workload that some maintainers
> > > have, this is the best that we can do at the moment.
> > >
> > > Putting it all in a github dashboard would not scale in the least (other
> > > projects smaller than us have tried and ended up stopping from doing
> > > that as it fails horribly).
> > >
> > > Yes, we can always do better, but remember that the submitter needs to
> > > take the time to ensure that their patches are applied. Heck, I have
> > > patches submitted months ago that I know the maintainers ignored, and I
> > > need to remember to send them again. We put the burden of development
> > > on the thing that scales, the developer themselves, not the maintainer
> > > here.
> > >
> > > It's the best that we know of how to do at the moment, and we are always
> > > trying to do better. Examples of this are where some subsystems are now
> > > getting multiple maintainers to handle the workload, and that's helping
> > > a lot. That doesn't work for all subsystems as not all subsystems can
> > > even find more than one maintainer who is willing to look at the
> > > patches.
> >
> > The issue here is that patches are lost and "up to the submitter" is
> > not fully working.
> > It may be working reasonably well when a developer has an official
> > assignment at work to do thing X, and then they can't miss/forget
> > about "is thing X merged yet". But it fails for new contributors,
> > drive-by improvements, good samaritan fixes, etc. Things that we need
> > no less than the first category (maybe more).
> > Machines are always better than humans at such scrupulous tracking
> > work. So if humans can do it, machines will do even better.
> > The dashboard definitely needs to be sharded in multiple dimensions.
> > E.g. "per subsystem", "per assigned reviewer", and even "per author".
> > Because e.g. how may mine are lost? Only this one or more? How many
> > yours are lost? Do you know?
> > I am sure this is doable and beneficial. I don't know why other
> > projects failed with this, maybe that's something with github. But
> > there are also codebases that are 100x larger than kernel and do
> > amount of changes kernel receives in a year in less than a week and
> > nothing gets lots thanks to scalable processes and automation.
>
> Out of curiosity which ones?

I mean in particular Google codebase [1] but I think Facebook [2],
Chromium [3], Rust [4], Go processes share lots of the same
principles. Overall idea is process unification and automation and
building more complex functions on top of lower-level functions. This
allows to move very fast at very large scale and at the same time
preserving very high code quality (as required by and proven by
continuous delivery).

I feel that perhaps I failed to explain the larger picture assuming
that it's common knowledge, but perhaps it's not, so I draw this
1-pager diagram how functions build on top of functions and all fit
together:

https://docs.google.com/presentation/d/e/2PACX-1vRq2SdmiP-wqUb3Xo2drgn48bw2HbyGqFPP-ebfTfn6eNZkHSRwKZKRBAT6K3E3Ra9IJ218ZqRxvmfG/pub
(also attached if you prefer a download)

The goal is not to say that this is the only true way of doing things
or that we need all of this, but to show that higher-level nice things
can't be built without proper lower-level foundation. We all agree on
few lowest level things (like git and C), which is good and already
brings tremendous benefits. But it really feels to me that at the
current kernel scale and fundamentality we need the next layer of
common building blocks in the process: things like change tracking (in
particular, patches that can be reliably applied) and tests (that are
easy to add, discoverer, run locally and on CI). And to really work as
foundation these things need to be agreed on as being "the solution"
(e.g. "all kernel changes go through patchwork") rather then "being
allowed to be used by fragmented groups if they want".

[1] https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code-in-a-single-repository/fulltext
[2] https://framethink.wordpress.com/2011/01/17/how-facebook-ships-code/
[3] https://www.youtube.com/watch?v=dIageYT0Vgg
[4] https://www.chromium.org/developers

> > > Please, resubmit your mount patch again, that's a crazy bug :)
> >
> > That's the problem. It now requires way more additional work and it's
> > even unclear if the problem is still there or not, the code has
> > radically changed. It could have been merged back then as is with 0
> > additional work. I could have been updated it a week after original
> > submission. But now that's completely paged out, I am looking at that
> > file and I don't recognize anything, no idea how the patch should be
> > updated, I now have no idea what tree such patch should be based on,
> > etc.
>
> Yeah, it's painful at times... I know it as well.
>
> > Also a good question is how many other my patches were lost that I now
> > have no idea about? I discovered this one by pure accident.
>
> Well, I do keep track of my own submitted patches in my git tree and
> occasionally sweep through it and resubmit / ping about lost ones. But yes,
> it requires some motivation and self-discipline which is not always present
> for drive-by contributions.
>
> > To make things more constructive: say, if somebody offers to build
> > such a system for kernel, in accordance with kernel specific
> > requirements (would also enable presubmit testing, recording base
> > commit, not losing review comments and other nice things), would you,
> > Linus (not sure who is in change of such decisions) be willing to
> > integrate it into the official kernel development process so that
> > everybody use it for all kernel changes?
>
> Well, we do have patchwork and some subsystems use it. It doesn't have
> pre-submit testing or other fancy features but it is good enough so that
> patches don't get lost. And Konstantin (kernel.org admin) did quite some
> good work recently with automating lots of tedious tasks with patchwork. So
> at this point I think it's more about the fact that some maintainers prefer
> to work differently rather than the lack of tooling as such.
>
> Honza
> --
> Jan Kara <jack@xxxxxxxx>
> SUSE Labs, CR

Attachment: devtools.pdf
Description: Adobe PDF document