Background

When he's not teaching CS32, Tim enjoys playing all kinds of sports during his free time. From frisbee golf and basketball to ice hockey and foosball, Tim has a wide range of sports on his resume. However, Tim has had a difficult time trying to distinguish a "real sport" from a "fake sport." He has looked at many different factors — including but not limited to strength, endurance, coordination, and speed — but is still having trouble making the distinction. He decides to set up an automated testing service that determines a "real sport" from a "fake sport" but has forgotten how to set up such a testing service. Help Tim with his very specific problem by learning about our testing infastructure (in addition to git)!

Overview

Throughout the labs this semester, you will be developing various parts of an autocorrection program that provides suggestions given an incomplete or incorrect input word, similar to what Google does with its search. Your program will accept a corpus — a text file to analyze in order to build its understanding of the words and phrases users might be typing. Then, it will read the user input, for which it should offer autocorrect suggestions. We will be providing all of the stencil code you need for the labs. You'll get access to our stencil code by using Git—which is conveniently also one of the purposes of this lab! Today, you'll be learning the both basics of Git and how our testing infastructure. This lab will be the foundation for much of your workflow this semester, so refer back any time you get stuck!

Git-ting Started

Note: You must read and sign the Collaboration Policy before starting this lab or the first project. When signing, we'll ask you to set up a GitHub account, and we'll collect your GitHub username. Please also fill out this survey so that can have more insight into the background and demographics of different students in the class. This survey is completely anonymous.

Before starting this lab, make sure you've gone through the Getting Set Up lab. This lab walks you through setting up working from home, the Java IDE, Maven, Git, Checkstyle, and other useful information. In addition, to installing these, read our Explaining Our Tools Doc to understand why these tools are important.

This lab may seem lengthy but most of it is just reading to set up a workflow and infastructure that you will use throughout the semester.

What is Git?

Git is a version control system for tracking changes to files, and allowing multiple people to make changes to those files concurrently. It is primarily used for source code management in software development, but it can be used to keep track of changes in any set of files. Imagine this: you need to work together with other developers and all of you will be using the same codebase. Rather than sharing code on Google Drive, or Dropbox, or in a shared filesystem, with Git you can see what others are working on, view your previous changes, rollback to your previous code, and do a lot more than all this!

How does Git do all of this? Each time a change is made to a file, Git takes a snapshot (called a commit—but don't worry about that yet) of the project's entire directory. It then stores those snapshots as unique versions of the entire project. If you later decide that some changes you made were not good, you can just roll back to the last good snapshot and continue working from there. Git is also hugely collaborative: when someone sends you changes, you can merge those changes into your codebase, and then your collaborator can grab the merged version and continue working. Git isn't magic, so conflicts do occur ("You changed the signature of this method, but I deleted the whole method; how should we resolve that?") but on the whole, Git is very good at tracking and backing up your work as you and any collaborators make changes.

Part 1: Cloning a repo

We will be using GitHub Classroom this semester to distribute stencil code for each lab and project. You can get the stencil for this lab here. This link will allow you to join our GitHub organization.

When you accepted the GitHub Classroom link for this lab, GitHub implicity forked our stencil code (from one of our staff's private repositories) into your own private repo. What does that mean? Forking a repo just means making a copy that is disconnected from the original. We have an original copy of our lab stencil, but we want you to have your own, private copy which you can modify without affecting ours. Forking our repo gives you that private copy— what's stored on your private repo is now completely disconnected from our original copy.

Your forked repo (e.g. "lab-git-testing-<YOUR GITHUB USERNAME>") is now known as your remote repo. A remote repo is the master copy, and is usually stored on a server (and not your local machine). You never edit your remote repo directly (think about why this might be bad practice). Instead, you'll clone the remote repo into a local copy on your machine. You edit this local copy and push ("upload") changes from the cloned version of the repo to the remote one. You can later pull ("download") changes stored in the remote repo to your local repo. Your remote repo is sometimes also known as your origin repo. The origin repo describes the repo from which your local copy is cloned.

Now that you understand forking (which already happened to your lab repo) and cloning (which hasn't happened yet), and now that you're in your CS 32 directory, you'll clone (i.e. download) the Autocorrect repository by running git clone <url>. You can find <url> by clicking the green "Clone or download" button on the Github page of your repo and copying it from there. The clone command will create a new directory called lab-git-testing-<YOUR GITHUB USERNAME> and download the repository into that directory.

Part 2: Branching and Pulling in Changes

If you look at your local repo, you'll notice it's pretty empty. In fact, there should only be a README there. But we promised you stencil code! To get our additional stencil code for this lab, you'll need to pull from a separate branch. But what is a branch?

In Git, pieces of work are generally centered around branches. A branch is a working set of changes that will eventually be merged back into the application's codebase. When just 1 person is working in a codebase, as you will be for the first project, you may not need to branch. However, using branches is a good way to keep changes organized (and know which changes broke what later on!).

You can list all the branches on your local repo with the command git branch. When you do this at first, you should just see master which is our main branch in any git repo. You can see all the branches on your remote repo with the command git branch -a. One of the branches for the remote repo should be remotes/origin/stencil, which is where the stencil is! You can switch between branches with the command git checkout <branch name>, so run git checkout stencil and then ls, and you should now have several files and directories besides the README. Yay!

Since the changes on the stencil branch are updated, we should bring them into our main master branch. First, git checkout master to get back to master branch where we should just have a single README. Now, run git pull origin stencil to bring the stencil into master.

We will be making some changes to some of the tests in the stencil code, so let's make these changes in a separate branch. We are going to and create a new branch with the command git checkout -b <new branch name>. Create a new branch now to reflect the changes you're about to make. You will now begin modifying the code in your repo on your new branch!

Part 3: Autocorrect, who?

Now that you've gotten our stencil forked, cloned, and pulled, it's time to play around with Autocorrect!

The autocorrect service generates suggestions with a “trie” (pronounced like “try”). Tries are a specific type of tree and are frequently used to represent a dictionary of words in an efficient way. Each node is associated with a letter and a set of child nodes representing potential suffixes to this node. Much more information about tries can be found online, for example on wikipedia. You can view our Trie implementation in src/main/java/edu/brown/cs/student/main/Trie.java

Here is an overview of the app, with some instructions for running it. The app, when you first run it, reads in a number of text files and creates a corpus of "known" words. Then, you may type text into the application, and it will suggest words from its corpus that look like what you typed in. Autocorrect will find similar words to the text you entered based on three flags listed below, aggregating these suggestions and printing the first five suggestions alphabetically. The specific suggestions are governed in part by the flags you provide to the application:

./run --data <file1,file2,...> [--whitespace] [--prefix] [--led num]
Runs Autocorrect from the command line and creates a corpus based on the comma-separated list of files provided as a string. Corpus files can be found in the data folder.
--whitespace mode will look for missing spaces in your input. For example, "mousecheese" may return the suggestion "mouse cheese"
--prefix mode will look for words in the corpus that begin with the provided prefix. For example, "compl" will return the suggestions "complex" and "compliment" among many others.
--led num mode will return words that have a Levenshtein Edit Distance less than or equal to num. This is probably what you think of as "traditional" autocorrect: typing "compoter" with a LED setting of 1 will return suggestions including "computer".
./run --gui [--port [number]]
Launches GUI at the specified port number or 4567 if no port number is given. Further options may be specificed in the GUI.

For instance, if I wanted to load words from the dictionary and search for similar words to "progra" using only prefix matching, I would enter ./run --data data/dictionary.txt --prefix . The terminal will now have a blank line and I can type progra and press enter. I should then see some suggestions like below. Neat!

    prograde
    program
    programed
    programer
    programers

Part 4: JUnit Testing

Now that we have our stencil code and understand how Autocorrect works, let's take a break with git and write some code! We will begin with Unit Testing.

Unit testing is very important! Good unit tests will test individual components of your program independent from any other components, ensuring that that component is functioning correctly. As components are tested hermetically, unit tests can be very useful for debugging your programs. In industry, the more the codebase grows (and thus the more impact it has) the more time will be spent testing. Even brief failures can cause big losses!

Building and Testing the Lab

To compile any maven project, you will need to run mvn package. Run mvn package to build your lab. You will notice that building your project will also run the JUnit tests. This makes it so then if you change your code to fail previously passing tests, you will notice. Otherwise you can also run mvn test to only run the unit tests. If you run mvn surefire-report:report, then a nicer, generated HTML report of your test results will be created and stored at target/site/surefire-report.html.

Note that maven expects your unit tests to be in src/test/ and to mirror the same package and directory structure of that found in src/main/. Moreover, it will run files that end in "Test.java". You will need to use this directory structure in future projects.

Task

When you run the JUnit tests, you will notice that three of the six completed tests in TrieTest.java fail. Fix these tests and rerun Autocorrect to make sure all tests pass. You should not be changing the source code; only the test cases themselves.

Now that all the JUnit tests pass, fill in the missing test case at the end of TrieTest.java. Try to be thorough when testing. Keep in mind that a test case can have more than one assertion statement. It can also helpful to break different types of cases into different methods with informative names.

When you have in total seven working test cases, continue to the next section of the lab.

Helpful Features: Set Up and Tear Down

JUnit includes 2 special methods called setUp() and tearDown(). All the code in the setUp() method is run once before each of your test cases, and all the code in the tearDown() method is run once after each of your tests. This allows you to save time and avoid writing repetitive code when creating tests. For instance, imagine that you are testing all of the non-static methods for a new class you created. Without using setUp/tearDown, you would need to instantiate the class each time to use the methods. But, you can do this once with the setUp() method, and have access to the newly created object for each of your tests. tearDown() is useful if there is something you need to do at the end of each of your tests, for example closing a connection to a database.

Additionally, it is possible to write a setUp() method that is run once before any of your unit tests are run, and a tearDown() method that is run once after all your unit tests are finished. Click here for more info.

Part 5: System Testing

Software engineering relies not only on the parts functioning correctly individually, but also that the application as a whole is working as expected. In CS 32, we will focus this holistic testing (or "system" testing) on the command-line parts of your applications. Testing GUIs is a much more complicated (and, to an extent, still-unsolved) problem!

The CS 32 Test Harness

For Stars and every assignment thereafter, we will be supplying you with a system testing harness called cs32-test so you can test your code and make sure it matches our specifications. However, we will not be giving you too many test cases, so you will be expected to write your own! An assignment must pass the test harness and provided test cases to be considered a working assignment. To use our system tester, run ./cs32-test <test suite>. The default executable to be tested is ./run, but you can use the -e flag to test a different one. When you want to run multiple tests at once, you should put them all in one directory, say src/test/system/, and then pass src/test/system/*.test as the test suite to cs32-test. Or, if you prefer, you can pass the specific .test files you want to run. You can read more about the system tester here.

In Autocorrect, for example, to test your executable on the provided tests, run ./cs32-test src/test/system/*.test from the top directory of the project. Then each test will be run against the given executable and the output will be printed. You should find it passes all but one test! To add more tests, make a new file for each one in src/test/ and put an ARGS, INPUT, and OUTPUT tag in each. You can look at how we structure our files as a guide to making your own. After the ARGS tag, you should put whatever the desired command line arguments are on one line. Then, list the desired inputs and outputs of your program under their respective tags.

Here is an example of a test file which runs Autocorrect with the prefix option turned on, with a LED of 1, and which expects corrections for the word "nortoa" to "norton".

ARGS
--data data/norton.txt --prefix --led 1
INPUT
nortoa
OUTPUT
norton
END

An important part of system testing is catching malformed inputs from the user. This will be especially important for projects in CS32. Check out the Expecting an Error section in the system-tester guide.

Task

In this part of the lab, you will be testing the functionality of your Autocorrect program as a whole. The supplied tests should contain two simple tests to get you started. First, you should figure out how the system tests are laid out so our script can understand them. There is a third test that is failing, which you need to fix. Fix this broken test first.

Once you've fixed the broken test, you'll write some of your own which are simple and hopefully demonstrate that you have a fully functioning Autocorrect. Make sure to check for edge cases! Write one more system test from scratch.

Part 6: JaCoCo

Maven allows for a plugin called JaCoCo that allows us to see how well out JUnit tests cover the lines of source code we have written. JaCoco creates a "coverage" report, so you can see what lines of codes and branches your tests exercise. First, to find your coverage report run mvn site. This command may take some time to complete, but once it does a website in your target folder containing different reports on your code will be generated, one of these reports being JaCoco. Open target/site/index.html. A website titled "About Autocorrect" should be displayed. Now click "Project Reports" on the left hand side, then click "JaCoco." This is the test coverage report for your code. Tinker with it to see which classes are tested and which are not. Coverage and test coverage in general will be discussed much more in depth in lecture.

Part 7: Making and Committing Changes

Now that you've made some changes and your tests pass, it's time to commit those changes using git. Committing changes means you check the changes into Git's history, so they're tracked and can be examined or even reversed. Commits should always include a descriptive message describing what changes they contain, so you-from-the-future (or, more importantly, other people you work with) can identify exactly what happened to the codebase as a result of your commit.

Git keeps track of all the files that were created, modified, or removed since the last commit. To see what was changed, use the command git status. Do this now. Your output should look something like this:

$ git status                                                                                         
On branch my-branch                                                                            
Changes not staged for commit:                                                              
	(use "git add/rm ..." to update what will be committed)                  
	(use "git checkout -- ..." to discard changes in working directory)              
			
        modified:   ./labs-student/src/main/java/edu/brown/cs/student/main/TrieTest.java
        modified:   ./labs-student/src/test/system/ranking_order_alphabetical.test     
		
				    
Untracked files:                                                            
	(use "git add ..." to include in what will be committed)                 
							
        ./labs-student/src/test/system/<YOUR NEW SYSTEM TEST>.test     
Before you can commit changes, you need to add the files that were changed, so Git knows what files to track. To add files, use the command git add -A to add all the files you changed. To commit changes, use the command git commit -m "<commit message>". Add and commit your changes now with a descriptive message. Then, push them to your custom branch on your remote repo using git push --set-upstream origin <YOUR NEW BRANCH>.

Part 8: Merging changes

You've been working on the branch you created at the start of Part 2. However, when you've completed a set of changes, you need to merge them into your master branch. Your master branch reflects the master copy of your codebase. It's the one that you'd use to package and release your application, but it's also the one we will use to grade!

Changes are merged from one branch into another by first switching to the branch into which the changes will be merged (in this case, master), and then executing git merge <branch to merge in>. Merge your changes into the master branch. We must first get back to the master branch using git checkout master then we can merge changes from your branch using the merge command listed above.

Now we can push to the master branch similar to we pushed to our custom branch. Again, enter git add -A and git commit -m "<commit message>" to add and commit your merged files. Now just run git push to push to the master branch!

When you did this, you shouldn't have run into any conflicts. Sometimes, though, you will. When you have a merge conflict (meaning executing the merge will overwrite somebody else's changes), you will need to manually go through each conflict file and choose which changes you want to keep (yours, or theirs). You probably won't run into this problem in CS 32, but if you do, feel free to reference our merge conflict guide or ask a TA for help.

Getting Checked Off

For this lab your are allowed to come and get checked off at any lab hours between January 23 to January 29. For all labs after this, you will be assigned to a two hour long weekly lab section.

To get checked off for this lab, first show a TA that you have filled out the Collaboration Policy Form that contains a quick quiz on scenarios discussed in the Collab Policy. Then, show and explain to the TA your methodology for fixing previous tests and writing new tests as well as your JaCoCo report. Also, show the TA that you have merged in code from the git branch you created and pushed it to master. Be sure to ask the TA any questions you may have about Git, Testing, Autocorrect or CS32 in general!

Additional Resources

There's TONS of information about Git online, but here we've collected a few resources that we as a staff have found particularly helpful:

CS32 Git Reference: A quick Git guide written by the staff!

So what is Git?: A basic introduction to the concept of Git, as well as a primer on many of the same commands we described here. Useful if you'd like another perspective!

Undoing changes in Git: more in-depth descriptions of the various ways to undo (and redo!) changes in Git.

Learn Git Branching: an interactive, visual guide to understanding exactly what happens to the branch structure of a Git repo when you make changes. A great reference before doing practically anything on Git!