Brown CS: CSCI 1730: Programming Languages: Testing

Testing: General Principles

All functions must have tests that exercise non-trivial cases. We require this to encourage good software development skills, but, more importantly, to force you to show us that you know what your code is supposed to do and that you’re not getting a correct answer by guessing. We will not give full credit to untested functionality, even if it is correct!

Good testing:

Strikes a good balance between testing individual features and testing the program as a whole.
Actually tests the program. That means not only executing the program but also checking the output to ensure it is what you expected.
Includes comments to explain the purpose of each test.

War-Grading

As an innovation this year, we are creating a system of war-grading, where you will test one another's programs, and will get credit for the errors you find (and lose credit for false accusations).

You will turn in two files for each assignment: one is the program and the other is the test suite, as a series of (test ...) expressions with perhaps some definitions at the top. The precise format will be specified later.

We will write a very good test suite, a buggy program, and a gold-standard implementation.

We will then run every program every against all other test suites.

You get points for programming and points for testing. Your total score is the sum of these two.

You start with zero testing points.

You get points for every time one of your tests exposes a bug in some program.

You lose points for every time one of your tests falsely accuses some program.

Here are some consequences of the above, as well as design considerations, and their resolution. If the same test is in every test suite, and your program fails it, you lose points each time. Isn't this unfair? To the contrary, it means that it's a really common test (it showed up in numerous independent test suites!), and you should especially get the common case right, so you deserve to lose a lot of points.

If you create a really obscure test that every program fails, you get points for each program's failure. Isn't this unfair? To the contrary, you're clearly a very smart tester—you can think out the intended system behavior in greater detail than anyone else. You deserve to be given credit for that.

If you write a mediocre program you'll lose lots of points; if you write a weak test suite, you will not recapture enough of those points. Thus, if your programming is going poorly, you have an incentive to concentrate on testing. (And if you really did understand what the assignment wanted, you can demonstrate that by writing good tests, which will mitigate your inability to solve the task in your program.) This is a good principle: it's better to prevent errors than to introduce them, and if you're having trouble writing a program, you should find a different way to be productive. Unlike most homeworks, this assignment gives you a way (just as the real world has both programmers and testers).

You seem to get no reward for writing tests that don't expose errors, even though these tests matter. Your reward is in fact for your own program: those tests have helped make your program better and thus less likely to run afoul of the tests of others.

If everyone writes a perfect program, then it appears to not matter how much testing you do: you can never catch any flaws. This is not true. This is where our buggy program comes in: you will always get some credit for finding the bugs in it. This is therefore an incentive for you to provide a non-empty test suite.

If you make a mistake, perhaps someone else will make the same mistake. If you wrote a regression test to make sure you didn't repeat the error, and someone else fell afoul of it, you will earn credit for finding that bug in their program. Therefore, this rewards regression testing. (It would really infuriate you if you found a bug, fixed it, failed to make a regression test out of it, and ended up reintroducing it.)

It would appear that there's a perverse incentive for you to include lots of instances of redundant tests. It turns out this isn't really a great problem—if you're curious why, ask us.