Re: [MAINTAINER SUMMIT] Folios as a potential Kernel/Maintainers Summit topic?

From: James Bottomley
Date: Thu Sep 16 2021 - 13:20:00 EST

Next message: Greg Kroah-Hartman: "[PATCH 5.14 112/432] scsi: fdomain: Fix error return code in fdomain_probe()"
Previous message: Greg Kroah-Hartman: "[PATCH 5.14 106/432] NFSv4/pnfs: The layout barrier indicate a minimal value for the seqid"
In reply to: Chris Mason: "Re: [MAINTAINER SUMMIT] Folios as a potential Kernel/Maintainers Summit topic?"
Next in thread: Theodore Ts'o: "Re: [MAINTAINER SUMMIT] Folios as a potential Kernel/Maintainers Summit topic?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, 2021-09-16 at 16:46 +0000, Chris Mason wrote:
> > On Sep 15, 2021, at 3:15 PM, James Bottomley <
> > James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > My reading of the email threads is that they're iterating to an
> > actual conclusion (I admit, I'm surprised) ... or at least the
> > disagreements are getting less. Since the merge window closed this
> > is now a 5.16 thing, so there's no huge urgency to getting it
> > resolved next week.
> >
>
> I think the urgency is mostly around clarity for others with out of
> tree work, or who are depending on folios in some other way. Setting
> up a clear set of conditions for the path forward should also be part
> of saying not-yet to merging them.
>
> > > * What process should we use to make the overall development of
> > > folio sized changes more predictable and rewarding for everyone
> > > involved?
> >
> > Well, the current one seems to be working (admittedly eventually,
> > so achieving faster resolution next time might be good) ... but I'm
> > sure you could propose alternatives ... especially in the time to
> > resolution department.
>
> It feels like these patches are moving forward, but with a pretty
> heavy emotional cost for the people involved. I'll definitely agree
> this has been our process for a long time, but I'm struggling to
> understand why we'd call it working.

Well ... moving forwards then.

> In general, we've all come to terms with huge changes being a slog
> through consensus building, design compromise, the actual technical
> work, and the rebase/test/fix iteration cycle. It's stressful, both
> because of technical difficulty and because the whole process is
> filled with uncertainty.
>
> With folios, we don't have general consensus on:
>
> * Which problems are being solved? Kent's writeup makes it pretty
> clear filesystems and memory management developers have diverging
> opinions on this. Our process in general is to put this into patch
> 0. It mostly works, but there's an intermediate step between patch 0
> and the full lwn article that would be really nice to have.

I agree here ... but problem definition is supposed to be the job of
the submitter and fully laid out in the cover letter.

> * Who is responsible for accepting the design, and which acks must be
> obtained before it goes upstream? Our process here is pretty similar
> to waiting for answers to messages in bottles. We consistently leave
> it implicit and poorly defined.

My answer to this would be the same list of people who'd be responsible
for ack'ing the patches. However, we're always very reluctant to ack
designs in case people don't like the look of the code when it appears
and don't want to be bound by the ack on the design. I think we can
get around this by making it clear that design acks are equivalent to
"This sounds OK but I won't know for definite until I see the code"

> * What work is left before it can go upstream? Our process could be
> effectively modeled by postit notes on one person's monitor, which
> they may or may not share with the group. Also, since we don't have
> agreement on which acks are required, there's no way to have any
> certainty about what work is left. It leaves authors feeling
> derailed when discussion shifts and reviewers feeling frustrated and
> ignored.

Actually, I don't see who should ack being an unknown. The MAINTAINERS
file covers most of the kernel and a set of scripts will tell you based
on your code who the maintainers are ... that would seem to be the
definitive ack list.

I think the problem is the ack list for features covering large areas
is large and the problems come when the acker's don't agree ... some
like it, some don't. The only deadlock breaking mechanism we have for
this is either Linus yelling at everyone or something happening to get
everyone into alignment (like an MM summit meeting). Our current model
seems to be every acker has a foot on the brake, which means a single
nack can derail the process. It gets even worse if you get a couple of
nacks each requesting mutually conflicting things.

We also have this other problem of subsystems not being entirely
collaborative. If one subsystem really likes it and another doesn't,
there's a fear in the maintainers of simply being overridden by the
pull request going through the liking subsystem's tree. This could be
seen as a deadlock breaking mechanism, but fear of this happening
drives overreactions.

We could definitely do a clear definition of who is allowed to nack and
when can that be overridden.

> * How do we divide up the long term future direction into individual
> steps that we can merge? This also goes back to consensus on the
> design. We can't decide which parts are going to get layered in
> future merge windows until we know if we're building a car or a
> banana stand.

This is usual for all large patches, though, and the author gets to
design this.

> * What tests will we use to validate it all? Work this spread out is
> too big for one developer to test alone. We need ways for people
> sign up and agree on which tests/benchmarks provide meaningful
> results.

In most large patches I've worked on, the maintainers raise worry about
various areas (usually performance) and the author gets to design tests
to validate or invalidate the concern ... which can become very open
ended if the concern is vague.

> The end result of all of this is that missing a merge window isn't
> just about a time delay. You add N months of total uncertainty,
> where every new email could result in having to start over from
> scratch. Willy's do-whatever-the-fuck-you-want-I'm-going-on-vacation
> email is probably the least surprising part of the whole thread.
>
> Internally, we tend to use a simple shared document to nail all of
> this down. A two page google doc for folios could probably have
> avoided a lot of pain here, especially if we’re able to agree on
> stakeholders.

You mean like a cover letter? Or do you mean a living document that
the acker's could comment on and amend?

James

Next message: Greg Kroah-Hartman: "[PATCH 5.14 112/432] scsi: fdomain: Fix error return code in fdomain_probe()"
Previous message: Greg Kroah-Hartman: "[PATCH 5.14 106/432] NFSv4/pnfs: The layout barrier indicate a minimal value for the seqid"
In reply to: Chris Mason: "Re: [MAINTAINER SUMMIT] Folios as a potential Kernel/Maintainers Summit topic?"
Next in thread: Theodore Ts'o: "Re: [MAINTAINER SUMMIT] Folios as a potential Kernel/Maintainers Summit topic?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]