Re: RFC: starting a kernel-testers group for newbies

From: Arjan van de Ven
Date: Thu May 01 2008 - 01:04:39 EST

Next message: Willy Tarreau: "Re: Slow DOWN, please!!!"
Previous message: Linus Torvalds: "Re: Slow DOWN, please!!!"
Next in thread: Andrew Morton: "Re: RFC: starting a kernel-testers group for newbies"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, 1 May 2008 03:31:25 +0300
Adrian Bunk <bunk@xxxxxxxxxx> wrote:

> On Wed, Apr 30, 2008 at 01:31:08PM -0700, Linus Torvalds wrote:
> >
> >
> > On Wed, 30 Apr 2008, Andrew Morton wrote:
> > >
> > > <jumps up and down>
> > >
> > > There should be nothing in 2.6.x-rc1 which wasn't in 2.6.x-mm1!
> >
> > The problem I see with both -mm and linux-next is that they tend to
> > be better at finding the "physical conflict" kind of issues (ie the
> > merge itself fails) than the "code looks ok but doesn't actually
> > work" kind of issue.
> >
> > Why?
> >
> > The tester base is simply too small.
> >
> > Now, if *that* could be improved, that would be wonderful, but I'm
> > not seeing it as very likely.
> >
> > I think we have fairly good penetration these days with the regular
> > -git tree, but I think that one is quite frankly a *lot* less scary
> > than -mm or -next are, and there it has been an absolutely huge
> > boon to get the kernel into the Fedora test-builds etc (and I
> > _think_ Ubuntu and SuSE also started something like that).
> >
> > So I'm very pessimistic about getting a lot of test coverage before
> > -rc1.
> >
> > Maybe too pessimistic, who knows?
>
> First of all:
> I 100% agree with Andrew that our biggest problems are in reviewing
> code and resolving bugs, not in finding bugs (we already have far too
> many unresolved bugs).

I would argue instead that we don't know which bugs to fix first.
We're never going to fix all bugs, and to be honest, that's ok.
As long as we fix the important bugs, we're doing really well.
And at least for the kerneloops.org reported issues, we're doing quite ok.

For me, 'important' is a combination of effect of the bug and the number of people
it'll hit. A compiler warning on parisc is less important than easy to trigger filesystem corruption
in ext3 that way; more people will hit it and the effect is more grave.

For oopses and WARN_ON()'s were getting to the hang of this now with kerneloops.org,
at least for the oopses that aren't really hard fatal. One thing I learned at least is that
lkml is a poor representation of what people actually hit; it's a very very selective
audience.
oopses/warnons are only a subset of the bugs of course... but still.

So there's a few things we (and you / janitors) can do over time to get better data on what issues
people hit:
1) Get automated collection of issues more wide spread. The wider our net the better we know which
issues get hit a lot, and plain the more data we have on when things start, when they stop, etc etc.
Especially if you get a lot of testers in your project, I'd like them to install the client for easy reporting
of issues.
2) We should add more WARN_ON()s on "known bad" conditions. If it WARN_ON()'s, we can learn about it via
the automated collection. And we can then do the statistics to figure out which ones happen a lot.
3) We need to get persistent-across-reboot oops saving going; there's some venues for this

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Willy Tarreau: "Re: Slow DOWN, please!!!"
Previous message: Linus Torvalds: "Re: Slow DOWN, please!!!"
Next in thread: Andrew Morton: "Re: RFC: starting a kernel-testers group for newbies"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]