Skip to main content

Lab 6: Databases

due in-class Monday, March 8th at 10:50AM, accepting submissions until Friday, March 12th at 10:00AM

Setup

Overview

In this lab you will have the opportunity to explore one of two popular database technologies: MongoDB and MySQL.

We have a large database of CD's deployed using both MongoDB and MySQL and in this lab we will ask you to connect to one of the two and then to query the database for interesting facts about CD's and Artists!

Make sure to thoroughly read through the contents of the lab before starting.

MySQL

If you are going to use MySQL, we recommend that you use the package Node MySQL 2 to connect to the server with Node. We have already installed this package for you. Additionally, if using MySQL, you must be connected to the Brown VPN to connect to the database.

MySQL databases are hosted by a server. The database you will be using is the cdquery1 database hosted at bdognom.cs.brown.edu . You will be accessing it as the user names <cslogin> with the password <banner id> . Therefore to create a pool with the database you can connect like this (provided in the stencil):

                  
const mysql = require('mysql2');

const pool = mysql.createConnection({
  host: 'bdognom-v2.cs.brown.edu',
  user: '<cslogin>',
  password : '<banner id>',
  database: 'cdquery1',
});
                  

For more information regarding the connection method, please refer to prelab 6 or official documentation .

If you have mysql in your terminal, you can also run the following command to connect to MySQL database:
mysql -h bdognom-v2.cs.brown.edu -u cslogin -p bannerid

Below you can see the schema for the database, which will be very useful in designing your SQL queries.

                      
mysql> show tables;
+--------------------+
| Tables_in_cdquery1 |
+--------------------+
| artist             |
| disk               |
| shared_disk        |
| track              |
+--------------------+
                  
                      
artist(id CHAR(12) PRIMARY KEY,
  name CHAR(255) NOT NULL);

disk(id CHAR(12) NOT NULL,
   cddbid CHAR(12) NOT NULL,
   title TEXT,
   artistid CHAR(12),
   length CHAR(12),
   genre CHAR(12),
   year INT(11));

shared_disk(
    artist1 CHAR(12) PRIMARY KEY,
    artist2 CHAR(12),
)

track(id CHAR(12) PRIAMRY KEY,
    name CHAR(255),
    diskid CHAR(12),
    artistid CHAR(12),
    length INT(11),
    number SMALLINT(6),
    offnum INT(11));

Make sure to read through the SQL section of the prelab. For more information on writing SQL queries you can visit the resources listed in the prelab.

You might find async and await very useful to write procedural functions. Moreover, when you need to wait upon multiple promises being fulfilled, you can use Promise.all.

This tutorial covers Promises, async, await. Feel free to read over it to get a deeper understanding about modern asynchronous Javascript.

Here is an example of a query using mysql2 and async/await to get the record from the artist table for "The Beatles":


let artist = "The Beatles";
let [rows, fields] = await pool.promise().query(`SELECT * FROM artist WHERE name='${artist}'`);

MongoDB

If you are going to use MongoDB, we recommend that you use the mongoDB package that was discussed in the prelab.

The database can be found at mongodb://bdognom-v2.cs.brown.edu/cdquery1 and in order to succesfully connect you will be using the following login credentials user: <cslogin> and password: <Banner ID> . The database has two collections: cds, and shared_disk.
Below we have provided you with some code to connect to the database.

                          
const { MongoClient } = require('mongodb');

const databaseUrl = `mongodb://${username}:${bannerID}@bdognom-v2.cs.brown.edu/cdquery1`;
let client = new MongoClient(databaseUrl, {
    useNewUrlParser: true,
    useUnifiedTopology: true,
});
client.connect(function (err, client) {
    if (err) throw err;
    // print whether client is connected to database
    if (client.isConnected()) {
        console.log('UPDATE: Successfully established connection with server');
    }
    let db = client.db("cdquery1");
    related(db).then(() => 
        client.close()
    );
});

                      

If you have mongo in your terminal, you can also run the following command to connect to Mongo database:
mongo bdognom-v2.cs.brown.edu/cdquery1 -u cslogin -p bannerid

Furthermore, there are many helpful functions that you may want to use for constructing mongoDB queries. Make sure to read through the MongoDB section of the prelab for a tutorial with mongoDB.

Requirements

Concentrator Requirements

You may have previously heard of The Six Degrees of Kevin Bacon . It is a popular game where movie buffs challenge each other to find the shortest path between any actor and Kevin Bacon using their filmograhy as stepping stones.

You can consider two actors to be related based on whether they were in a movie together. However, this idea can be abstracted to be used outside of film and with music instead!

In this lab you will work to discover the appropriate "Degrees" of relation for a certain artist (the K degrees of a particular given artist). Similar to how actors are related by their movies, musicians will be related by their CDs. You will write a NodeJS program that finds the transitive closure of this graph of CDs using a Breadth-First Search (I know this may sound intimidating, but the pseudocode below should clarify things!). Below are the requirements for this lab:

  • The program needs to be able to connect to either the MongoDB or MySQL database.
  • Your main task for this lab is to create a method `related()` that will determine the K degrees of "Beyonce" and the number of artists that are related to her. Here is some pseudocode for this task:

    
    let all_reachable_artists = { }
    let working = { artist0 }
    let to_add = { }
    
    let degrees = 0 
    while ( working not empty ) 
       for each artist in working
          add artist to all_reachable_artists
          let related = set of all artists related to artist (query database using shared_disk) 
          for each ar in related
             if ar in all_reachable_artists or ar in working or ar in to_add continue
             add ar to to_add
           end
        end
        output degrees and sizeof(all_reachable_artists)
        degrees++
        working = to_add
        to_add = { }  
    end
    
    

    You should implement the method related() that outputs k degrees of Beyonce and the total number of related artists.

  • You need to make sure that after you have finished with the database you correctly close your database connection. After your program has logged its output, the program should terminate and not hang because the database connection is still open.

What it means for Artists to be related

In the Six Degrees of Kevin Bacon, a connection can be made between two actors if they acted in a movie together. Similarly, we can define artists to be related based on collaboration, or even similar cd titles, genres, years, and more!

For this lab, two artists are related if they are on the same CD. You should use the shared_disk table/collection. You should not be determining whether or not artists are related with your own queries because this will exhaust the server!

Given initial artist X, there are a certain number of artists who directly share an album with X (artist group #1), there are a certain number of artists who share an album with X or with someone in group #1 (group #2), there are a certain number of artists that are within 3 links of X (group #3), and so on… This eventually stops when no more artists can be added to this unique set of artists. The length of the longest link from X is of interest (this is the degrees mentioned above).

Error Handling

You are not responsible for data mistakes in the database. You may also notice when running your queries that there are duplicate elements in the database, certain values are NULL/missing, query results may change, or that the MongoDB and MySQL database have inconsistent data (ex: one db has more tracks than the other). You may decide how to handle this in your output, but we are not asking you to clean the database in any way.

In fact, the scenarios where you have a messy database to deal with might be more frequent than you believe.

Designer Requirements

If you have a designer on your team or you are in the design track and are working on your own, your task for this lab will be to design the front end for an application that interacts with a database of CD's.

You are in charge of designing a search page and a results page that is meant to allow users to easily search for related artists.

You do not have to connect the page to a server or database, only design it as if it was. For example, you can fake a database by copying information from database into a local variable.

  • The search page must contain a search field as well as a submit button that allows the entry of an artists name in order to search for their discography and other artist suggestions.
  • Make sure to design the page in such a way that it is straightforward to understand what the user interaction should be. It should be clear to users what they are searching for and how to search for it.

You may use HTML, CSS, JavaScript, and any other front end technology you like for designing the web page. However, you must write a substantial portion of the code.

As always, if you choose to use any sources you should properly cite that you did in your README! You should also consider what kind of information users will be searching for when creating your design. To help you understand what type of data users may be searching for you can look above to the database schema's provided for concentrators.

Hand in

Handin on Gradescope preferably using your cloned git repo. Please make sure to delete the node_modules/ folder. It makes the handin size huge, and we do not need it to recreate your dev setup. .

Side Note: cdquery1 vs cdquery (Optional)

By default, you are connecting to cdquery1 database which is a much smaller version of the cdquery database (around a 1% random sample). We will grade you on cdquery1 database but you are welcome to run your program against the cdquery database to test out how optimized your queries are.

You may find that some queries may take a long time, especially when performing complicated operations. This is because the cdquery database is very large.

Optionally, when you are done, you can test your queries on the cdquery database to check how well optimized your queries are.