Re: [GIT PULL] bcachefs fixes for 6.12-rc2

From: Kent Overstreet
Date: Sun Oct 06 2024 - 15:30:14 EST


On Sun, Oct 06, 2024 at 12:04:45PM GMT, Linus Torvalds wrote:
> On Sat, 5 Oct 2024 at 21:33, Kent Overstreet <kent.overstreet@xxxxxxxxx> wrote:
> >
> > On Sun, Oct 06, 2024 at 12:30:02AM GMT, Theodore Ts'o wrote:
> > >
> > > You may believe that yours is better than anyone else's, but with
> > > respect, I disagree, at least for my own workflow and use case. And
> > > if you look at the number of contributors in both Luis and my xfstests
> > > runners[2][3], I suspect you'll find that we have far more
> > > contributors in our git repo than your solo effort....
> >
> > Correct me if I'm wrong, but your system isn't available to the
> > community, and I haven't seen a CI or dashboard for kdevops?
> >
> > Believe me, I would love to not be sinking time into this as well, but
> > we need to standardize on something everyone can use.
>
> I really don't think we necessarily need to standardize. Certainly not
> across completely different subsystems.
>
> Maybe filesystem people have something in common, but honestly, even
> that is rather questionable. Different filesystems have enough
> different features that you will have different testing needs.
>
> And a filesystem tree and an architecture tree (or the networking
> tree, or whatever) have basically almost _zero_ overlap in testing -
> apart from the obvious side of just basic build and boot testing.
>
> And don't even get me started on drivers, which have a whole different
> thing and can generally not be tested in some random VM at all.

Drivers are obviously a whole different ballgame, but what I'm after is
more
- tooling the community can use
- some level of common infrastructure, so we're not all rolling our own.

"Test infrastructure the community can use" is a big one, because
enabling the community and making it easier for people to participate
and do real development is where our pipeline of new engineers comes
from.

Over the past 15 years, I've seen the filesystem community get smaller
and older, and that's not a good thing. I've had some good success with
giving ktest access to people in the community, who then start using it
actively and contributing (small, so far) patches (and interesting, a
lot of the new activity is from China) - this means they can do
development at a reasonable pace and I don't have to look at their code
until it's actually passing all the tests, which is _huge_.

And filesystem tests take overnight to run on a single machine, so
having something that gets them results back in 20 minutes is also huge.

The other thing I'd really like is to take the best of what we've got
for testrunner/CI dashboard (and opinions will vary, but of course I
like ktest the best) and make it available to other subsystems (mm,
block, kselftests) because not everyone has time to roll their own.

That takes a lot of facetime - getting to know people's workflows,
porting tests - so it hasn't happened as much as I'd like, but it's
still an active interest of mine.

> So no. People should *not* try to standardize on something everyone can use.
>
> But _everybody_ should participate in the basic build testing (and the
> basic boot testing we have, even if it probably doesn't exercise much
> of most subsystems). That covers a *lot* of stuff that various
> domain-specific testing does not (and generally should not).
>
> For example, when you do filesystem-specific testing, you very seldom
> have much issues with different compilers or architectures. Sure,
> there can be compiler version issues that affect behavior, but let's
> be honest: it's very very rare. And yes, there are big-endian machines
> and the whole 32-bit vs 64-bit thing, and that can certainly affect
> your filesystem testing, but I would expect it to be a fairly rare and
> secondary thing for you to worry about when you try to stress your
> filesystem for correctness.

But - a big gap right now is endian /portability/, and that one is a
pain to cover with automated tests because you either need access to
both big and little endian hardware (at a minumm for creating test
images), or you need to run qemu in full-emulation mode, which is pretty
unbearably slow.

> But build and boot testing? All those random configs, all those odd
> architectures, and all those odd compilers *do* affect build testing.
> So you as a filesystem maintainer should *not* generally strive to do
> your own basic build test, but very much participate in the generic
> build test that is being done by various bots (not just on linux-next,
> but things like the 0day bot on various patch series posted to the
> list etc).
>
> End result: one size does not fit all. But I get unhappy when I see
> some subsystem that doesn't seem to participate in what I consider the
> absolute bare minimum.

So the big issue for me has been that with the -next/0day pipeline, I
have no visibility into when it finishes; which means it has to go onto
my mental stack of things to watch for and becomes yet another thing to
pipeline, and the more I have to pipeline the more I lose track of
things.

(Seriously: when I am constantly tracking 5 different bug reports and
talking to 5 different users, every additional bit of mental state I
have to remember is death by a thousand cuts).

Which would all be solved with a dashboard - which is why adding the
bulid testing to ktest (or ideally, stealing _all_ the 0day tests for
ktest) is becoming a bigger and bigger priority.

> Btw, there are other ways to make me less unhappy. For example, a
> couple of years ago, we had a string of issues with the networking
> tree. Not because there was any particular maintenance issue, but
> because the networking tree is basically one of the biggest subsystems
> there are, and so bugs just happen more for that simple reason. Random
> driver issues that got found resolved quickly, but that kept happening
> in rc releases (or even final releases).
>
> And that was *despite* the networking fixes generally having been in linux-next.

Yeah, same thing has been going on in filesystem land, which is why now
have fs-next that we're supposed to be targeting our testing automation
at.

That one will likely come slower for me, because I need to clear out a
bunch of CI failing tests before I'll want to look at that, but it's on
my radar.

> Now, the reason I mention the networking tree is that the one simple
> thing that made it a lot less stressful was that I asked whether the
> networking fixes pulls could just come in on Thursday instead of late
> on Friday or Saturday. That meant that any silly things that the bots
> picked up on (or good testers picked up on quickly) now had an extra
> day or two to get resolved.

Ok, if fixes coming in on Saturday is an issue for you that's something
I can absolutely change. The only _critical_ one for rc2 was the
__wait_for_freeing_inode() fix (which did come in late), the rest
could've waited until Monday.

> Now, it may be that the string of unfortunate networking issues that
> caused this policy were entirely just bad luck, and we just haven't
> had that. But the networking pull still comes in on Thursdays, and
> we've been doing it that way for four years, and it seems to have
> worked out well for both sides. I certainly feel a lot better about
> being able to do the (sometimes fairly sizeable) pull on a Thursday,
> knowing that if there is some last-minute issue, we can still fix just
> *that* before the rc or final release.
>
> And hey, that's literally just a "this was how we dealt with one
> particular situation". Not everybody needs to have the same rules,
> because the exact details will be different. I like doing releases on
> Sundays, because that way the people who do a fairly normal Mon-Fri
> week come in to a fresh release (whether rc or not). And people tend
> to like sending in their "work of the week" to me on Fridays, so I get
> a lot of pull requests on Friday, and most of the time that works just
> fine.
>
> So the networking tree timing policy ended up working quite well for
> that, but there's no reason it should be "The Rule" and that everybody
> should do it. But maybe it would lessen the stress on both sides for
> bcachefs too if we aimed for that kind of thing?

Yeah, that sounds like the plan then.