Regression tests, benchmark histories (Was: (reiserfs, ext2 resizing patches, etc.))

From: Warren Young (tangent@cyberport.com)
Date: Tue Jun 20 2000 - 07:49:55 EST


tytso@mit.edu wrote:
>
> Andreas Dilger has asked us to integrate a patch which makes it easier
> for him to maintain his on-line resizing patch. It moves a lot of code
> around from ext2_read_super() to a new function. It is supposedly
> doesn't change any lines of codes, but it is a very large patch, and so
> it's painful to integrate. Given that it had no functional changes, I
> gave it a low priority.

Disclaimer: The following is a theoretical question, and has nothing to
do with the controversial nonsense that preceded this post.

Would it not be helpful to create some regression tests for core
subsystems like ext2? Then you could refactor it at will: as long as it
still runs the regression tests, you know the change didn't break
anything. (Of course, if the change does break something, the
regression tests need to be extended, not condemned.)

Proponents of software refactoring claim that this allows you to work
faster, even after accounting for the time to write and run the tests.
With the tests to back you up, you're more confident about applying
changes, and can quickly verify that they're compatible. Just run the
test suite to be sure that everything still works; if it does, you can
then bolt new functionality onto the newly-reorganized code, which
should also not change existing behavior.

This idea isn't meant to get around code freezes. It's to allow you to
work more confidently within a development cycle. It would, however,
allow an intermediate stage: full development, then _behavior_ freeze,
then full code freeze. The last N months have taught us that the kernel
apparently enjoys a long second cycle, where the code isn't so much
frozen as in a state of tweaking for performance and correctness.

New regression tests are written any time you're about to add
functionality where you know the test suite does not test existing
behavior. So, you write a test that exercises the existing
functionality. Then you do your change, and run the test again, to make
sure the existing behavior didn't change.

For ext2, the regression suite could also test things like speed. This
would allow maintainers of cooperating subsystems to make changes
(cache, VM, disk interface drivers...) and then verify that they haven't
reduced the run time of the ext2 test suite. This would eliminate user
reports claiming that kernel X.yy is slower on their systems than X.(yy
- 1) was. Or if performance goes down, you would know ahead of time, so
you could tell users about it when you sent out the kernel update
announcement. This would be for changes that must be spread across
several releases, where the kernel is temporarily slower, and over the
next few releases speed comes back up to a /previously measured level/.

The tests need to be shipped along with the kernel code, much like the
GCC package works. If something breaks, you can ask the user to run
test case 42 and report the behavior. For that matter, we might be able
to use Cygnus' DejaGnu regression test suite.

The main problem with this approach with respect to the Linux kernel is
that it requires stable interfaces. Any interface that doesn't remain
stable ends up causing more work maintaining the regression tests than
the tests are worth. Stability is only required at the source level,
not at the binary level; this is not a stealth call for petrifying Linux
kernel binary interfaces. It's worth considering if there are some
interfaces that are stable by design.

-- 
= Warren -- ICBM Address: 36.8274040 N, 108.0204086 W, alt. 1714m

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Jun 23 2000 - 21:00:19 EST