Testing ParselTongue

You are the proud CEO of BuildHub, a company that provides 3D printing solutions for any device on the market. Unfortunately, every device on the market has a slightly different implementation of the scripting language ParselTongue, which is the high-level language of choice for describing 3D designs. In order to write software for all these devices, you need to figure out how the implementations on different machines differ.

Your task is clear: you need to design a test suite of ParselTongue programs that can find the bugs in these implementations. You have at your disposal:

Getting Started

Go to the links for the executable downloads and find the one that matches your platform. Unzip it and find the executable at (depending which one you downloaded):

The program takes a few commands:

--interp

  Run standard input as a ParselTongue program, using the correct
  implementation.

  > echo "+(40, 2)" | ./debian-dist/bin/assignment1-debian --interp
  42

--test-interps <directory>

  Run the tests in <directory> against all the broken interpreters.
  This will yield an error if the tests don't pass the standard
  implementation.  It will yield output that describes which
  interpreters your test suite catches (if any), and those that it
  doesn't.

  > .\win32-dist\assignment1.exe --test-interps path\to\my\tests

--brief <directory>

  Run the tests, as with --test-interps, but provide much briefer output
  (just one line of pass/fail for each broken interpreter).

--single <interp-name> <directory>

  The output from --test-interps has nicknames for the broken
  interpreters.  With the --single option, only that interpreter is
  run.  E.g.

  > ./osx-dist/bin/assignment1-osx --single objects1 path/to/my/tests

  This also gives much more detailed feedback on how the output from that
  interpreter differed from what your tests indicate was expected.

--report <directory>

  (described in the "Handing In" section)

To get started, download and unzip the sample test suite, and run the executable on it with the --test-interps options. (Part of) the output should look like this:

if-then-else1:
Bug not found!
if-then-else2:
Differences in:
/Users/Jonah/Documents/fall2012/cs173ta/repo/parseltongue-lang/../doc/assignment1/sample-test-suite/if1.psl

This indicates that your test 'if1.psl' found an error in the broken interpreter named 'if-then-else2', but none of your tests identified the problem with 'if-then-else1'. The labels if-then-elseN simply indicate that the problem with the interpreter is somewhere in the if-then ParselTongue construct. There are interpreters that have incorrect behavior on objects, functions, variables, scope, and more, and these labels should guide you towards which features you need to test more.

NOTE: Unfortunately, there are by mistake 2 different interpreters both named operators2. They have different bugs to find; they just also unfortunately happen to have identical names. So, you have to catch both of them since they're different.

Writing your own tests

The best way to get a feel for writing tests is to examine the existing tests in our sample test suite. Every test case consists of up to three files:

  1. <testname>.psl - the ParselTongue program to interpret
  2. <testname>.psl.expected - the expected result of interpreting the program (output to stdout)
  3. <testname>.psl.error - the expected error output of interpreting the program (output to stderror)

An omitted expected or error file is treated as an empty string---e.g. if you create a test with only a .expected and .psl file, you are telling the testing script that you expect no error output at all.

Caveat regarding timeouts: If your test causes the interpreter to run for more than about 3 seconds, it will be stopped. Since there is no way to tell what ouput should have occurred by then (because of file buffering reasons and general OS nondeterminism), the testing script flags it as a timeout and assumes empty standard out and empty standard error for the run. The upshot is that the testing script will detect timeouts as different from non-timeouts, so if your test, for example, runs infinitely on the correct interpreter but not on a broken interpreter, you will get credit for having detected that interpreter's bug. Conversely, if it times out on both, you don't detect the bug.

You can generate the expected and error outputs by running the correct interpreter on the test program like so:

> ./debian-dist/bin/assignment1-debian --interp < mytest.psl 1> my-test.psl.expected
    2> my-test.psl.error

On Windows, the redirection works a little differently. The type command will output the file you want to run, which you can then pipe to the interpreter command. Inside a command prompt:

> type mytest.psl | .\win32-dist\assignment1-win32.exe --interp > mytest.psl.expected
    2> mytest.psl.error

Windows seems finicky about redirecting output; we've had to run the output redirection a few times to get it to work in some cases. Let us (and the course community) know if you run into any problems.

Each test should target a specific feature of the language. A good test suite consists of many small test cases that each exercise a single facet of the language using as few other language constructs as possible. Explain what each test is testing in a comment at the top of the test file.

All of your tests should pass our definitionally correct interpreter. No matter how legitimate you think your test is, if it fails on our interpreter, it is wrong. Like any language in the real world (think JavaScript), it is the implementation of ParselTongue that you need to worry about when running your code, not the specification. The spec should function as an extremely detailed guide for your testing.

How to Proceed

Start with the tutorial, it will give you a tour through ParselTongue's features and syntax. It also has a ton of helpful hints for particularly interesting features to test. If you try all the recommended examples in the tutorial, you'll be well on your way.

Armed with the tutorial, you should be able to start building up your own test suite, using the instructions above. You should be able to catch a number of the broken interpreters with fairly straightforward programs. Others will seem much harder.

When you've gotten to some interpreters that don't seem to be easy to detect, use their labels to figure out what kind of feature they are testing, and go to the spec. If it's functions, go check the spec for a detailed look at when, how, and where the arguments are evaluated, for example. We'll be honest; some are quite devious, but we promise that we have a test suite that catches each and every one of them.

Handing in

Before you hand in your assignment, you'll need to grade it using the grade report option built into the assignment executable:

--report <directory>

  Run the tests in <directory> against all the broken interpreters,
  and generate a grade report.  This report is what you'll hand in
  when you're ready to submit your assignment, and contains your
  grade and a signature of its authenticity.

  > ./osx-dist/bin/assignment1-osx --report path/to/my/tests > my-grade-report.txt

Once you've generated your grade report upload it here with:

  1. Your grade report file in txt format (you must include the .txt extension)
  2. A zip file containing your test suite (and nothing more)