Background

Tim has recently developed an addiction to geocaching, and wants to start training for the Brown competitive geocaching championships. However, the GPS on his handheld device has stopped working, so it's up to you to implement a new one. Help him out by creating a mapping system allowing him to find the shortest path from point A to point B so he can win the upcoming geocaching tournament!

Overview

In this assignment, you will be providing directions on a map using ideas and code from all of your previous assignments, coupled with a new user interface that you will be creating. This project, as you can guess, contains one main component: the REPL of the map program. The maps command line interface will be due on Friday, April 10th at 6:00PM ET.

Just as your tIMDb submission required functional Stars code, your Maps submission should have functional Stars and tIMDb code. This entails that your REPL should support commands from all three of the projects in this class.

A note on partners: Your partner for this project cannot be in your term project group or your tIMDb partner. Each partner must supply code from one of their previous assignments to use in this project (i.e. one person’s Stars, and the other person’s tIMDb). You should explain whose code was used for which parts in your README. In addition to this, we expect the work for this assignment to be divided up fairly. A fair division and contribution of labor will be part of your grades.

Setting Up

You should have recieved an inviation to join a GitHub repo with your partner. If you do not see one (check your email!), please reach out to the HTAs.

As with tIMDb, you'll notice that there's no code provided in the repo - you will need to provide your own run script, testing folders, and more (feel free to take it from your past projects!). You and your partner should use this repo to develop and hand in your project.

The public TA tests are in /course/cs0320/data/maps/tests/*, and the data in /course/cs0320/data/maps/. Please copy these, somewhere in your project folder. As always, you will need to tell us how to build and run your program, and how to run your tests in your README.



REPL (due April 10th at 6:00PM ET)

Data Format

Like in tIMDb, there is an SQL database you will be interacting with to get data. There are two tables, Node and Way. There is no joint table in this assignment because the data being represented is not a many-to-many relationship. Each way is only associated with two nodes. This allows us to store the nodes associated with each way directly in the Way table.

The data is in /course/cs0320/data/maps/maps.sqlite3 and a small dataset is in /course/cs0320/data/maps/smallMaps.sqlite3.

Node

This table describes the location of every node in the map of Rhode Island. There are three columns: id, latitude, and longitude. Each row specifies one node's ID, and its corresponding latitude and longitude.

Way

Ways are paths between nodes. A way has an id, start, end, and potentially a name and type. Ways are directed, you may only travel from the start node to the end node. Two separate ways are used to represent two-way streets. IDs are unique, but names are usually not unique. If you're planning to use road types to do something interesting, more information is included here.

Ways have one (or none!) of several types. These types are determined by OpenStreetMap, which is where our map data comes from. If a way's type is "unclassified" or the empty string, we consider the way nontraversable. These ways represent the walls of buildings or other lines on the map that are not streets or walkways. These ways should not be used to generate routes. All other ways are considered to be traversable and should be included in the route command.

Similarly, we consider nodes that lie at the start or end of a traversable way to be traversable. This means that your nearest and route commands should only select nodes that are traversable. After all, it wouldn't make much sense to route a path along the perimeter of a building!

Command Line Specifications

For ease of testing (ours and yours), you will provide a way to run your project from the command line. Write a REPL that takes the following commands:

  • map <path/to/database> : load map data from specified path/to/database, a SQLite3 database that contains the tables that will be used for maps. See Data Format for more info.
  • ways <lat1> <lon1> <lat2> <lon2> : show all ways that lie inside the bounding box with northwest point (lat1, lon1) and southeast point (lat2, lon2).
    • Note: "Inside" means that there is at least one end point for each way that lies within the bounding box.
  • nearest <latitude> <longitude> : get the nearest traversable node to the input point (latitude, longitude).
  • route <lat1> <lon1> <lat2> <lon2> : get the shortest path from the traversable node closest to (lat1, lon1) to the traversable node closest to (lat2, lon2).
    • If there is no path between two nodes, output <nodeID1> -/- <nodeID2>.
    • Regarding the case of the same nodes as start and end, as long as you have an implementation that does not break, you're good. We will not be testing for that case.
  • route "Street 1" "Cross-street 1" "Street 2" "Cross-street 2" : get the shortest path from the intersection of Street 1 and Cross-street 1 to the intersection of Street 2 and Cross-street 2.
    • Note that the street names are encased in quotes.
    • If there is no path between two nodes, output <nodeID1> -/- <nodeID2>.
    • Regarding the case of the same nodes as start and end, see the note above.

Output Specification

The output for the map command should be map set to <path/to/database>.

The output for the ways command should be a list of way IDs ordered alphabetically, with one way ID per line.

wayID1
wayID2
.
.
.
wayIDn

The output for the nearest command should be the node ID that represents the nearest traversable node to the input coordinates.

The output for the route commands should roughly follow the output spec from tIMDb. However, in order to get more specific details, use the IDs of the nodes and ways your path contains instead of human-readable names.

nodeID1 -> nodeID2 : wayID1
nodeID2 -> nodeID3 : wayID2
.
.
.
nodeIDn-1 -> nodeIDn : wayIDn-1

Because these IDs are unique, there will be no ambiguity.

Finally, if you receive any malformed input, print an error.

Adapting Existing Code

What is perhaps the most vital part of this project, and what is more or less the point of this course, is that you should have written your code in past projects to be generic and modular. This is helpful because you would like to avoid writing the same code over and over again for very similar purposes. Think of these past two projects as an exercise in writing useful libraries, with small wrappers around them to demonstrate their effectiveness.

You and your partner must each contribute code from one of the previous projects:

  • Stars for the nearest-neighbor search
  • tIMDb for the shortest-path search

We don’t expect that your code will be already be perfectly suited for this task, you will likely have to do some adaptation to make the pieces fit together. Try to make your new code flexible enough to handle this new job and its old job, rather than breaking old functionality for new. Improvements and fixes are always welcome – don’t be complacent in your project’s performance!

Optimization

Here are some tips to make your projects faster and more responsive. (You don't want to miss that CS32 lecture!):

A* Search

Since we are now working with a graph in roughly Haversine space, we have some guarantees about the minimum path length between two points. Therefore, we can optimize the shortest path search to try to search more in the general direction of the destination. This will cause it to search fewer nodes and return results much faster.

A* is a simple extension of Dijkstra’s algorithm. Normally, Dijkstra’s works by repeatedly extending known shortest paths until the destination is reached, and there are no more shorter candidates to explore. A* recognizes that a candidate path must be extended by at least the direct distance to the goal in order to reach it. During the search, that means that paths ending closer to the goal should be considered before paths leading away from the goal even if their current length is the same. It also means the search can be ended earlier. If a candidate is shorter than the path you've found so far, but it is far way, then it can never "win". You may find you can implement A* simply by adding the direct distance to the goal to your path's "weight". You will simultaneously be considering nodes which optimize the lowest cost path found so far with the lowest expected path to the goal; this means you are more likely to move in the general direction of the destination at each step. Feel free to look up more information up A* on the web.

Distance Metrics

You will find that you need to calculate the distance between two points in multiple areas of your project. Mapping the Earth is mad hard. People have spent centuries trying to portray the Earth's curvature, just skim through this list for dozens of examples. Thus, we don't expect you come up with your own formula, the Haversine formula is perfectly fine.

SQL Query Performance

You may be writing more involved SQL queries for this assignment. We don't expect you to be able to sit down and right out the perfect, optimized query the first time you try. Rather, we welcome you to experiment with different queries to see the tradeoff between simple queries (i.e. a few short queries and some extra Java code to tie their results together) and more complicated queries (i.e. one or two longer queries that don't require extra computation in Java). In general, it's faster to do more computation in the SQL query itself, but overly complex queries can slow down your program. You may want to look at SQL's UNION and BETWEEN operators for inspiration.

KD Tree Caching

Like most design choices in this class, there are are many different approaches and they all have advantages and disadvantages. You may want to consider creating one large KDTree when you first run your project and then using the large tree for your queries. If you have an efficient implementation of a KDTree, this shouldn't be a problem. Consider the advantages and disadvantages of this approach (your program may take a while at startup but could be quick for queries) and choose this design if you think it's best.

Alternatively, you can use SQL to load in a small portion of the data (around the location you're searching for), then putting that in a k-d tree to find the closest point to the click.

There are benefits to both approaches. This is for you to evaluate, and there may be ways to improve upon this design via caching.

Handing In

For Maps, minimum functionality requires passing the basic system tests provided.

Ensure that your REPL is extensible and can process both Stars and tIMDb commands in addition to the new commands for Maps. We will be testing for this!

Note that we will be grading you on the efficiency of your code as well as its correctness, so make sure you write tests for large data sets!

When you are done, hand in your project with README.md containing the following:

  • Which partner’s Stars and tIMDb were used (Remember that each partner must supply one of the two)
  • Partner division of labor (we will also send out an individual feedback form for your partner, as with tIMDb)
  • Known bugs
  • Design details specific to your code, including how you fit each of the prior project’s codebases together
  • Any runtime/space optimizations you made beyond the minimum requirements
  • How to run your tests
  • Any tests you wrote and tried by hand
  • How to build/run your program from the command line
  • What browser you used to test your GUI (we prefer Chrome, but we'll accept other common web browsers)

Remember that testing is worth a significant portion of your grade, so you should favor tests over non-required features if you are running low on time. When you are ready to hand in, follow the instructions on the handin guide to create a pull request to the handin and review branches, then submit the Google Form that has been used for past projects. Only one of you and your partner needs to hand it in.