Tables 2
1 Introduction
In Tables 1, you were introduced to methods of filtering, ordering, and selecting within tables in Pyret. While these operations are powerful, they can also be limiting as your data analysis needs grow. Therefore, tables enable you to export tabular data as lists, and process them that way.
For this assignment, you should write code that assigns your answers to the identifiers in the provided template files. Once you run your program, you can use the interactions area to view the results. You can also print out the full tables for both exercises. We expect you to submit the code you can use to generate your answers (not only the answers themselves!).
2 Exercise: Social Network
One of the most exciting new tech startups of the year is Plundr: the social network for pirates. Plundr has given you access to their database of users, with the task of answering several questions about the data.
2.1 Plundr Table Format
id :: String
username :: String
password :: String
posts :: Number
followers :: Number
following :: Number
favorites :: Number
2.2 Plundr Questions
What is the total number of posts?
What is the average (mean) number of posts per user?
What is the median number of favorites per user?
Extend the table to add a column named ratio, which is defined as the number of followers / following. Which user has the highest ratio?
3 Exercise: Providence Survey
We have conducted a survey of the Brown CS department. Now, it’s your job to find interesting relationships between the data from the survey.
3.1 Providence Table Format
Every column in this table contains String values.
There are two data sets associated with this survey. In the first, we’ve given you a subset of the data; in the second, you operate over the whole dataset. Both are linked in the support code.
3.2 Providence Support Code
In the support code for this assignment, we have extracted all of the data from this survey into string-dict objects. If you need help working with objects, please go to hours!Make sure you stick to the functional operations; e.g., avoid anything with -now in its name.
There are two instances of string-dict that you will be using. The first will map a String (the name of a column in the table) to a List<TableEntry>, where TableEntry has two members: an ID, representing the user who answered a particular question, and an entry, which is that user’s answer to the question. This List<TableEntry> represents a column of answers for a particular survey question.
The second instance of string-dict will map a String (the name of a survey question/column in the table) to a List<String>, containing all of the possible answers that could have been chosen for a particular survey question.
Additionally, we have provided the function chi-squared(lst :: List<List<Number>>)-> Number. This function calculates the correlation between two entities.
3.3 Providence Task
Using the two string-dicts that we have provided, fill in the observed-table function. This function takes two Strings, each of which is a different column name in the survey table. For example, the two strings could be "vegan" and "vegetarian". First you must generate the list of all possible answers for these two survey questions. Then, for each pair of answers (e.g., "NO" to "vegan", "YES" to "vegetarian"), you must count the number of users who gave that particular pair of answers to the two questions.
observed-table("vegan", "vegetarian") is [list: [list: 6, 40], [list: 0, 246]]
6 people who said "YES" to being vegan, and "YES" to vegetarian
40 people who said "NO" to being vegan, and "YES" to vegetarian
0 people who said "YES" to being vegan, and "NO" to vegetarian
246 people who said "NO" to being vegan, and "NO" to vegetarian
3.4 Observing Correlations
After you have completed the observe-table function, its output for a particular pair of questions will be used as the argument for chi-squared to calculate the correlation between answers to different questions. We have provided the function print-all-correlations that will print the chi-squared value for pairs of questions with particularly high chi-squared values (high chi-squared corresponds to high correlation between question answers).
After you have successfully run print-all-correlations for the survey subset, generate a plausible explanation for why each of those correlations occurs.
Then, run print-all-correlations again for the entire data set, and observe how the high correlations changed (you either gained or lost pairs).
In order to run your code for the entire data set, the only change to the code that must be made is which URL the survey table is loaded from. (For this, and only this, are you allowed to disregard the “Do not edit this section” comment in the stencil.)
4 Helpful Hints
The code template given below already contains commands to import the Plundr and Survey data under the names plundr and survey, respectively.
Unlike Tables 1, this assignment requires the use of extract to obtain lists from tables. In addition to extract, you will use functions from the math and statistics packages.
Additionally, for the survey section, you will need to either use functions like map, map2, fold, and fold2, or write them yourself.
5 Code Template
6 Handing In
Please use the filename tables-2-code.arr (note that the stencil is called tables-2-stencil, so you need to rename it).