On this page:
1 Introduction
2 Exercise:   Social Network
2.1 Plundr Table Format
2.2 Plundr Questions
3 Exercise:   Providence Survey
3.1 Providence Table Format
3.2 Providence Support Code
3.3 Providence Task
3.4 Observing Correlations
4 Helpful Hints
5 Code Template
6 Handing In

Tables 2

    1 Introduction

    2 Exercise: Social Network

      2.1 Plundr Table Format

      2.2 Plundr Questions

    3 Exercise: Providence Survey

      3.1 Providence Table Format

      3.2 Providence Support Code

      3.3 Providence Task

      3.4 Observing Correlations

    4 Helpful Hints

    5 Code Template

    6 Handing In

1 Introduction

In Tables 1, you were introduced to methods of filtering, ordering, and selecting within tables in Pyret. While these operations are powerful, they can also be limiting as your data analysis needs grow. Therefore, tables enable you to export tabular data as lists, and process them that way.

In addition to enabling you to work with tabular data converted into lists, Pyret also provides several useful mathematical and statistical functions. The documentation for functions used in this assignment, as well as relevant textbook chapters can be found at the links below:
(The textbook refers to outdated functions over lists. Look instead at the libraries referenced above.)

For this assignment, you should write code that assigns your answers to the identifiers in the provided template files. Once you run your program, you can use the interactions area to view the results. You can also print out the full tables for both exercises. We expect you to submit the code you can use to generate your answers (not only the answers themselves!).

2 Exercise: Social Network

One of the most exciting new tech startups of the year is Plundr: the social network for pirates. Plundr has given you access to their database of users, with the task of answering several questions about the data.

2.1 Plundr Table Format

The data are in a table with the following columns and their types:
  • id :: String

  • username :: String

  • password :: String

  • posts :: Number

  • followers :: Number

  • following :: Number

  • favorites :: Number

2.2 Plundr Questions
3 Exercise: Providence Survey

We have conducted a survey of the Brown CS department. Now, it’s your job to find interesting relationships between the data from the survey.

3.1 Providence Table Format

Every column in this table contains String values.

There are two data sets associated with this survey. In the first, we’ve given you a subset of the data; in the second, you operate over the whole dataset. Both are linked in the support code.

3.2 Providence Support Code

In the support code for this assignment, we have extracted all of the data from this survey into string-dict objects. If you need help working with objects, please go to hours!Make sure you stick to the functional operations; e.g., avoid anything with -now in its name.

There are two instances of string-dict that you will be using. The first will map a String (the name of a column in the table) to a List<TableEntry>, where TableEntry has two members: an ID, representing the user who answered a particular question, and an entry, which is that user’s answer to the question. This List<TableEntry> represents a column of answers for a particular survey question.

The second instance of string-dict will map a String (the name of a survey question/column in the table) to a List<String>, containing all of the possible answers that could have been chosen for a particular survey question.

Additionally, we have provided the function chi-squared(lst :: List<List<Number>>)-> Number. This function calculates the correlation between two entities.

3.3 Providence Task

Using the two string-dicts that we have provided, fill in the observed-table function. This function takes two Strings, each of which is a different column name in the survey table. For example, the two strings could be "vegan" and "vegetarian". First you must generate the list of all possible answers for these two survey questions. Then, for each pair of answers (e.g., "NO" to "vegan", "YES" to "vegetarian"), you must count the number of users who gave that particular pair of answers to the two questions.

This List<List<Number>> that you generate represents a two-way table, which is commonly used in statistics for analyzing categorical data. For the example above, your program should generate the following (on the larger data set):

observed-table("vegan", "vegetarian") is [list: [list: 6, 40], [list: 0, 246]]

Here there are:
  • 6 people who said "YES" to being vegan, and "YES" to vegetarian

  • 40 people who said "NO" to being vegan, and "YES" to vegetarian

  • 0 people who said "YES" to being vegan, and "NO" to vegetarian

  • 246 people who said "NO" to being vegan, and "NO" to vegetarian

3.4 Observing Correlations

After you have completed the observe-table function, its output for a particular pair of questions will be used as the argument for chi-squared to calculate the correlation between answers to different questions. We have provided the function print-all-correlations that will print the chi-squared value for pairs of questions with particularly high chi-squared values (high chi-squared corresponds to high correlation between question answers).

After you have successfully run print-all-correlations for the survey subset, generate a plausible explanation for why each of those correlations occurs.

Then, run print-all-correlations again for the entire data set, and observe how the high correlations changed (you either gained or lost pairs).

In order to run your code for the entire data set, the only change to the code that must be made is which URL the survey table is loaded from. (For this, and only this, are you allowed to disregard the “Do not edit this section” comment in the stencil.)

4 Helpful Hints

The code template given below already contains commands to import the Plundr and Survey data under the names plundr and survey, respectively.

Unlike Tables 1, this assignment requires the use of extract to obtain lists from tables. In addition to extract, you will use functions from the math and statistics packages.

Additionally, for the survey section, you will need to either use functions like map, map2, fold, and fold2, or write them yourself.

5 Code Template

Stencil

6 Handing In

Captain Teach

Please use the filename tables-2-code.arr (note that the stencil is called tables-2-stencil, so you need to rename it).