Programming with
Data Structures and Algorithms

Nile

You and your friends are creating Nile, which you hope will be the next big thing in on-line bookstores. You know that you can save money by anticipating what books people will buy; your brainstorm is that you will pass these savings on to your users by offering a discount if they buy books that Nile recommends.

To do this, you offer an incentive for people to upload their lists of recommended books. From their lists, you can establish suggested pairs. A pair of books is a suggested pair if both books appear on one person's recommendation list. Of course, some suggested pairs are far more popular than others; and for a given book, it is paired with some books much more frequently than with others.

You need to organize the list of recommended books to support two tasks:

The recommendation lists your program needs as input will be in files stored in a directory specified by the first command-line argument (accessible as args[0] in your main method). Each list will be stored in its own file, and every file in the directory will be an input list. Inside the lists every line will contain the description of a single book. Each book has a unique and unambiguous description; that is, if two lines in two different input lists are identical, they refer to the same book. Otherwise they refer to different books. Input lists will always contain at least two books, and they will never contain duplicates.

Support Code

To retrieve the input lists, you should use the NileIO class we provide. It has two public static methods:

We do not provide you with the source code (.java file) for NileIO, because we do not want you to concern yourself with its implementation. Instead, you may obtain it online either as a compiled .class file or in a .jar file. If working on departmental machines, you may also directly access either version from the /course/cs019/pub/ directory.

You will need to make sure the NileIO class is in your classpath. To be precise: if you use the .class file, the directory containing that file must be in your classpath; if you use the .jar, the jar itself must be in your classpath. Due to the abundance of Java development environments, we can only offer instructions for how to do this for the one with which we're familiar, Eclipse:

  1. If you are on your own computer, download the jar.
  2. In your Eclipse project, select File → Import. Select File System from the General section, and click Next.
  3. If on your own machine, click Browse…, and navigate to the directory/folder where you saved the jar. If on a Sunlab machine, enter /course/cs019/ and click once on the arrow to the right of the text field.
  4. If the leftmost pane, click (but don't check) the directory. In the rightmost pane, click NileIO.jar. Click Finish.
  5. In the Package Explorer tab on the left, right-click (control-click on OSX) your project, and select Properties.
  6. Select Java Build Path, then the Libraries tab. Click Add JARs…. Select your project, select NileIO.jar. Click OK, then OK.

(Can you see why we don't like Java development yet?)

Part 1

Write a program in Java that performs the above tasks. Your program should consist of a file, Nile.java, containing a class with a main method, along with any other files you may need. Your program will be run in one of two ways:

  1. When run with one argument (the directory path), your program should print a list of the most popular pairs of books. The count should be first, followed by a newline, and each pair should be printed on its own line with only a plus sign separating the two books in the pair. For example,
        12
        Lewis Carroll, Alice in Wonderland+Douglas Hofstadter, Godel, Escher, Bach
        Frank Herbert, Dune+Arthur C. Clarke, Childhood's End
        Jane Austen, Emma+Emily Bronte, Wuthering Heights
            
    Each pair of books should only appear once, and order is irrelevant. Note that
         Lewis Carroll, Alice in Wonderland+Douglas Hofstadter, Godel, Escher, Bach
            
    and
        Douglas Hofstadter, Godel, Escher, Bach+Lewis Carroll, Alice in Wonderland
            
    count as the same pair.
  2. Your program may also be run with two arguments, accessable as args[0] and args[1] in your main method. The first argument will once again be the directory path and the second argument will be a book description string in the same format as the input lists. Your program should print a list of books most frequently paired with it, preceded by the count as above. For example,
        7
        Lewis Carroll, Alice in Wonderland
        Larry Niven, Ringworld
        N.K. Stouffer, The Legend of Rah and the Muggles
            
    If the given book description cannot be found in any input list, your program should print an error message and exit.

No matter which way your program is run, it should print an error message and exit if the given directory is inaccessible or empty. If one of the input list files in the directory is inaccessible, your program should ignore that file and continue.

Note: Java automatically splits command line arguments at spaces, so if you gave as arguments

     /some/path Edwin A. Abbott, Flatland
you would get the args array
    ["/some/path", "Edwin", "A.", "Abbott,", "Flatland"]
To avoid this, you must put quotes around your book descriptions, like so:
    /some/path "Edwin A. Abbott, Flatland"
resulting in the correct args array
    ["/some/math", "Edwin A. Abbott, Flatland"]

Your goal in this assignment is to produce a well-designed, working implementation. You do not need to focus on efficiency. As with all code you write, make sure to include comments.

You may use any of the Java standard libraries, and the NileIO class we provide, but no outside code.

Part 2

Informally analyze the big-O time complexity of your implementation of the two operations. You do not need to rigorously prove your bounds, but you do need to explain your reasoning.

Part 3

Suppose you wanted your program to be interactive and process several requests in a short period of time. In particular, if your program is part of the back-end of a popular server with lots of data, it should respond in as close to constant time as possible. How would you modify your solutions to Part 1 to obtain this? You don't need to provide code; just discuss the algorithms and data structures you would use.

Part 4 – Honesty Policy

Before we will grade this assignment or any of your other assignments, you must read the honesty policy and sign off that you understand and agree to abide by it. To do this, you must copy the text in the following image into a plain text file and hand it in with your project.

Honesty Statement

What to turn in: