Congrats, you've just made it to the final round of the national dodgeball championship! In order to maintain an edge over your competitors, you need a way to hit as many as you can in a single throw. To do so, you must determine which competitors are closest to you. Luckily, you're taking CS32 and you've heard of a nifty algorithm that can do just that: a KD-tree search! In this project, you will implement a KD-tree to find the nearest dodgeball "stars" that you need to eliminate! Good luck!
You will be implementing a multi-dimensional nearest neighbors algorithm to search for stars in space. Your program will be able to do two things efficiently:
- Find the k-nearest neighbors: Take in a location in space (specified as either an x,y,z-coordinate or the name of a star) and the desired number of neighbors to be found, a positive integer k. Returns the closest k stars to the given point.
- Radius search: Take in a location in space and a distance. Return all of the stars within the given distance to the point.
main()to be in a Main class in edu.brown.cs.<username>.stars. However, you should be thinking about making parts of your code general purpose. Therefore, you use package names that reflect the general nature of any particular class. If the code is useful beyond reading and querying a star database, it doesn't belong in a stars package.
The details of your Stars implementation will, as always in this course, be up to you. However, here is a general roadmap towards a functional project. Some of these may not make sense until you read the rest of this page! If you want more help getting started, go to the project gear up or talk to a TA.
Read this entire page before starting.
Clone your GitHub repository
Start your Test Suite
Write some code!
Pay attention to the design questions, as they will help you think abstractly about structuring your project.
Our official advice is to write your system tests before you begin your implementation, and your JUnit tests once you have outlined your classes. Writing tests helps you understand the expected functionality of your code, define the API of your objects, and promotes incremental testing from the get-go. However, actually coding the project can provide further insight on the program itself and what tests to write. So, don't be worried if your testing plan evolves as you go.
You can complete these steps in any order you want. For example, you can start with your command line implementation (REPL) without filling in your search functions, or you can start with the CSV parser since it's fairly independent. Try to make all these individual peices abstract and reusable!
Make a GUI
Answer the Design Questions in your README
Check your Code style and functionality
Turn in your project using GitHub
Don't underestimate the final Handin step! It involves generating a github pull request and is accompanied by a seperate Project Handin Page.
Tips for a stellar stars project and beyond
In Software Engineering, although the functionality of your code is important, the quality is also.
Imagine you created an application, but at some point, it was too much to maintain by yourself. How readable and workable is your code for others? Say users wanted a new feature in your app. Is your code generic, modular, and extensible so you can easily add new features?
One of the key objectives of CS32 is to teach students to write code that not only works, but also follows strong coding paradigms and best practices. That is why we designed the projects to build upon each other and also why you will have a partner & group project. So, with this in mind, try to make your Stars code:
- Generic: think about how your code can handle multiple types of objects not just one
- Modular: try to maintain abstraction and group related functionality in packages
- Readable: use consistent style (32 Style Guide), comment descriptively, and avoid redundant code
- Efficient: consider space and time complexity
- Tested: test your code incrementally to make sure it works for general and edge cases
In order to find nearby stars, you will write and use a k-d tree data structure. A k-d tree is a special form of binary tree in which each node represents some k-dimensional subspace and all the items contained in that space. Although there is far more information about k-d trees online than we will provide here (which you are encouraged to read about, but not to use any online code), the basic idea is that the space corresponding to each non-leaf node of the tree is split between its two children. The dimension used to split each node's space in half depends on the depth of the node (it alternates through each dimension is descends). Leaf nodes may contain one or a few items, depending on your design. Internal nodes may or may not contain items, also depending on your design. Tell us about your choices and why you made them in your README.
Finding a single nearest neighbor can be done much more quickly in a k-d tree than in a list because large portions of the search space can be eliminated quickly. The expected runtime for nearest neighbor lookups is logarithmic for balanced k-d trees, because each level of the tree should eliminate half of the remaining stars. You should probably begin by writing code to find the single star closest to a given point.
Once you can find one nearest neighbor, you should extend your program to find an arbitrary number of nearest neighbors. We do not expect you to support insertions and deletions after initial construction.
For more information on searching k-d trees and some high level pseudocode for the nearest neighbors search function, check out this document:
We provide several CSV (Comma Separated Value) files that contain star data for you to load into your program for testing. The main file is data/stars/stardata.csv, which contains the locations of thousands of stars from the HYG database. This is an example of a good input file. You cannot use a CSV library to parse the file -- you must write your own!
What does a valid input file look like?
A valid input file will have a header line at the top containing 'StarID', 'ProperName', 'X', 'Y' and 'Z'. After a header line, each line in the file contains an ID, a name if one exists, and x, y, z coordinates for the star separated by commas. If a file does not conform with the said format, your program should print an informative error message.
What can I assume about the input files?
You may assume there will be no duplicate star names and that no star name contains a comma.
Command Line/REPL Specification
Running ./run in a terminal should start your program. It should accept two command-line arguments: --gui and --port <PORT>. We've included these for you in the stencil.
Your Java program will always have a command-line interface. It will have a graphical interface if specified with a --gui argument (explained in GUI Specification). In the command line, your program should run a 'REPL' (Read, eval, print loop), which should loop indefinitely, reading one line at a time, interpreting each line as a command (available commands specified below). It should then print the results of each command, in order. It should exit when it hits "end-of-file", which you can supply at the terminal by hitting Ctrl-D. To implement the REPL, see Java's BufferedReader class.
Your REPL should handle the following five commands:
stars filenamewhere filename is the path to a file containing stars information. This command loads in star position information. If you run this command multiple times, it should overwrite the k-d tree each time, rather than adding new elements to the existing k-d tree
neighbors k x y zwhere k is the nonnegative, integral number of nearest neighbors to find, and x, y, z represents the position from which to find the nearest neighbors.
neighbors k "name"where name is the name of a star, which must be given in quotes. For both of these neighbors commands, if there is a tie between stars (two stars being in the same location or two stars being equidistant from the given location), you may return one arbitrarily if choosing both would push you above the limit k.
radius r x y zwhere r is the nonnegative radius around the position to search, and x, y, z represents the position. A radius of 0 includes a star if and only if it is exactly at the specified position.
radius r "name"where name is the quoted name of a star, as in the neighbors command described above.
Read n stars from <filename> should be printed in response
to a successful
stars command. For easier testing on our part, please do not
worry about whether or not "stars" should be plural. It should always be plural in this
Read 119617 stars from data/stars/stardata.csv
Read 1 stars from data/stars/just_one_star.csv
the stars should be printed in sorted order,
from nearest to furthest away. If a star is specified by name in
the query, it should not be counted or printed during the
Your output should only print star IDs, not names. Here are some example input/output when your program is run against the supplied data file (though we will try it on many other commands, and many other files. Your tests should as well!)Input:
neighbors 5 "Sol"Output:
70667 71454 71457 87666 118721Input:
neighbors 5 0 0 0Output:
0 70667 71454 71457 87666
Be sure the format of your output looks like what we have above. Do not actually print "Input" and "Output"!
If the line read cannot be interpreted in a meaningful way (an incorrect command, a non-existent star name, etc), your program should print a reasonable error message and continue running. All error messages should begin with "ERROR:" (without the quotes). For example:
radius -4 7 3 2
ERROR: Radius must be non-negative.
ERROR: File does not exist.
The "ERROR:" tag allows us to automatically test your code against error conditions without specifying exactly what your messages should contain (or even even hinting what all the errors you should detect are). We do award more credit for “better” error messages.
With every project we will also have 2-3 design questions that you must answer in your README. These questions are supposed to help you make good design decisions and get you to look at how your program can be used beyond the scope of this project. We highly recommend reading the design questions and taking some time to design your system before you start coding so that you could easily add the functionality we specified in the future. Good design is one of the most important skills of being a software engineer, so taking the time to figure out a good design will greatly help you in the future.
You should think about your answers to these questions before you start programming. Nonetheless, you may find once you've finished your project that the way you designed your program is not ideal. In that case, you can explain how you would have redesigned your code so that you can give the best answer possible. We will grade these answers on how strong the design you described is and not what was implemented in your handin. (You could still lose points for your design, but not simply because your design is different from what you describe here.)
Here is an example design question and answer for Boggle:
How could you change your code to handle board that can be arbitrarily sized instead of 4x4?
In the Board class, I would add another board
Board(int size) that changed the board
construction loops to loop from 0 to size instead of 0 to 4. The
default constructor would remain the same, and another constructor
could be added to allow for both a particular string set and board
size. I would remove the static SIZE constant in favor of a
size instance variable, associated with each instance, that would be
set in the constructor.
Design questions for Stars:
- Suppose that in addition to "neighbors" and "radius" you wanted to support 10+ more commands. How would you change your code - particularly your repl parsing - to do this? Don't worry about algorithmic details with the k-d tree, we're interested in the design.
- What are some problems you would run into if you wanted to use your k-d tree to find points that are close to each other on the earth's surface? You do not need to explain how to solve these problems.
- Your k-d tree supports most of the methods required by the
Collection interface. What would you have to add/change to make
Collection<Star> db = new KDTree<Star>()compile and work properly?
If your program is invoked with "--gui" as an argument, it should start a web interface in addition to reading command-line input. If you haven't done so yet, you should look at Project 0 - Boggle for an introduction to Spark and sample code to help you begin. Our first lab will also help you get started!
If your program is invoked with "--gui --port 4567", http://localhost:4567/stars should display a page that allows a user to conveniently make all of the same types of queries that the CLI offers, but it should do so in a friendly, interactive manner. As will be the norm in 32, you have considerable leeway in designing your graphical interfaces, but we will be looking for convenient, clear interfaces. Since this is the first project, we will only expect a basic GUI. The CSV files can still be loaded on the command line, but you should create an HTML form that allows you to conduct k-nearest neighbors and radius searches, then displays the results. We encourage you to make the output more user-friendly than your command-line interface (don't just print IDs!).
We require thorough JUnit tests as a part of
your testing plan. This project will be very
challenging if you do not incrementally test your implementation as
you create it. Hint: compare the results of your implementation with
a simple, slow, and known correct implementation. We covered
testing in the first lab. Also, be sure to use Jacoco
mvn site to evaluate how well the code you have
written is covered by your JUnit tests!
We require that you pass the basic system tests that are provided by us in your repository (they're in your tests/ta/stars/ directory). These system tests are checking for basic functionality. We will run more comprehensive tests on your projects when grading, so don't be complacent about correctness. We'll also be testing your system tests on our own implementations of Stars to see if they catch various bugs in our code. Please include your own system tests in the tests/student/stars/ directory of your repository. Your system tests should look at basic functionality, edge cases, and malformed commands. Check out our cs32-test guide for more info.
A portion of your grade in this course will be on code style.
The standards which we will grade by are outlined in our official
CS32 Style Guide.
Definitely read over the overview and keep it in mind as you begin
coding. More importantly, read your Checkstyle report, as generated
mvn site (the Boggle project explains how to find
it). Do this early, and often, until you understand the rules
well. It is sad to lose points on this, but we feel it is important.
If Checkstyle raises an error that you find unreasonable, note the
error and why you think it is unreasonable in your README. We
expect you to have no unexplained Checkstyle errors in your final
handin. Configuring Eclipse as suggested in the
Working From Home Guide can save you literally hours of work
when fixing style errors.
Graduate Student Requirements
Grad students must additionally complete the following requirements:
- Add a
switch <naive|kdtree>command that changes the data structure that subsequent commands use
- Benchmark the difference between a naive O(n) implementation and a k-d tree
For benchmarking, run
time cs32-test tests/ta/stars/benchmark_naive.test
time cs32-test tests/ta/stars/benchmark_kdtree.test
and write the results in your
README.md (we only care about the real)
Setting up your development environment
We're providing the config files for Checkstyle (config/cs32_checks.xml) and Maven (pom.xml) and an executable to run your program (run). We'll also provide you with a stencil Java project to get started. We will cover how to run and build your program in lab 1.
If you haven't already, you should have signed the collaboration policy. When signing, we'll ask you to set up a Github account, and we'll collect your Github username.
Once you've done that, click on this GitHub Classroom link
to retrieve the stencil code. This will create your own, personal repository (aka "repo") that contains your code. The repository will be hosted somewhere like:
Now you'll make a local clone of your repo. Click the green
"Clone or download" button and copy the URL. On the computer
you'll be working on, open a terminal and navigate to where you'd like to store your projects. Once you're there, run
git clone <URL>. This copies the repo to your computer in a new directory, named "projects".
To update the remote repo (on Github) after editing your code, you'll first
add the files you updated, then make a
commit of your changes (with a message summarizing your changes), and finally
push your changes to the remote repo.
These commands, along with other helpful setup information can be found on the
Getting Set Up lab
as well as the Git+Testing lab.
(We've also copied them below.)
git add -Ato add all changed files. Or, you can selectively add files with
git add /path/to/file.
git commit -m "<DESCRIPTIVE SUMMARY>"to save your changes locally. Think of this as setting a checkpoint that you'll be able to return to whenever you'd like. You'll need to add any files you want to save before making a commit. Be sure to include a short summary of your changes.
git pushto upload your changes to the remote repo on Github. You can log on to Github and navigate to the repository to see your changes.
Git is a powerful version control system, and we've only cracked the surface. Since you're the only one editing this repo, you won't have to worry about what happens when someone else wants to make a change to the same file your working on. We'll explore other features of Git later, but you're encouraged to learn more here.
As specified in the course missive you must meet minimum functionality for each assignment in order to pass CS32. If you hand in a project that misses minimum functionality, you have until the end of semester to implement minimum functionality and get it checked off at TA hours.
For Stars, minimum functionality requires several things. First, your project must pass the basic system tests provided. Next, you need a GUI which supports all four search types and displays results without crashing. Finally, your implementation must use a KDTree.
Many of you may not have a lot of experience in building and working with large scale projects like Stars, so only for this project, we are giving you a brief overview of what grading looks like for CS32.
README: This is one of the first things we will be looking at. A big part of software engineering is to provide explanations on your project design details and how to run your program for users (in our case, your grader). Furthermore, your design questions will all be graded.
Functionality: Along with the test suite that we provide you, we will also be running your implementation on our hidden test suite. This test suite ensures accuracy as well as effective error handling, so make sure to consider all possible edge cases.
Testing: This is another important part of the rubric. We grade both unit tests and system tests. For unit tests, we want to see that there is comprehensive test coverage on the majority of your classes (with an emphasis on your core classes, like K-d tree). Feel free to check out the Getting Set Up lab, which has information about JaCoCo that you can use to check your test coverage. Note that we are not grading you on JaCoCo, but it is a nice tool to have. And for system tests, you should be throughly testing the specs of the project, ensuring that any sort of error or bad input won't crash the program. Keep in mind that testing is quality over quantity!
GUI: As mentioned in the GUI Specification section, your GUI should efficiently allow for neighbors and radius commands to run and display results (it's fine if you need to run stars command in the REPL). We only expect a basic GUI for this project, however, it should still be intuitive to use for users.
Good code/Bad code: This section evaluates your code and class design for things that were done above and beyond, and for things that should be improved for the future. Some of the points are for general good/bad code styles, like having thorough and informative comments. Other points are for how organized and intuitive your project's structure is, and how reusable and extensible your classes and data structures are.
(Optional) Design Check
For this project, there is an optional design check component. For this, you must s how a TA at hours a functioning REPL. By functioning, we mean that it satisfies the constraints: (1) it opens and prompts the user, (2) the user can type in some input, hit enter, and the REPL will return to a state where the user can type more input, and (3) the REPL exits cleanly (meaning it closes upon some command). You can get checked off for completing this design check by coming to any TA (not lab) hours Saturday, Feb 1st to Monday, Feb 10th, inclusive and showing your sufficient REPL. If a TA notes that your REPL meets our standards for this design check, you will recieve one extra late day. Note: a REPL that satisfies these constraints is by no means a perfect REPL, but a good jumping off point for developing a modular and extensible REPL.
As with every project, we expect you to update the README.md file, describing:
- Known bugs
- Design details specific to your code
- Any runtime/space optimizations you made beyond the minimum requirements
- How to run any tests you wrote/tried by hand
- How to build/run your program
- Answers to design questions
- Explanations for any Checkstyle errors your code has (hopefully none)
Include these subsections under the "Stars" section in the README.md file.
Very Important Note: TAs should be able to run your program by executing ./run from the root project directory. Make sure that this works for your program before handing your project in or there will be significant point deductions. Writing "run in eclipse" in your README is not acceptable.
Most Important Note: You will be using Pull Requests to hand in your projects. Follow the Project Handin Guide and be sure to read all of it!! If you encounter any issues the TAs will be more than happy to help in hours.
Check out the FAQs for Stars.