Syllabus:
-
Background
Unless you've been living in a cave, you know that many fields are faced with mountains of data that could present a unique opportunity if only it were possible to efficiently process all this information. This leads to the so-called "Big Data" challenge that has been getting lots of press, including a call to arms from the President.
The database field has been addressing problems of scale for some time, but the exponential growth in data volumes over the last decade has even the traditional database providers stumped. In other words, the products that the big database firms have been selling are not up to the task. If you couple this with some interesting new hardware opportunities, you get a moment in time in which many new and radical approaches to database management systems have been proposed but the winners have not yet been determined.
These modern database engines typically each address one particular common workload thereby bucking the conventional wisdom that you can build a DBMS for which "one size fits all".This course will attempt to survey the wide array of modern database systems and will try to understand them better by comparing and contrasting features and approaches. We will be guided by many of the results that have been published in the research literature. Of course, the best way to understand this material is to get our hands dirty by doing a project that involves one or more of the new architectures.
-
Course Layout
This course is a graduate-level seminar. Thus, the mode of learning will be through reading, discussion, and independent projects. It will also rely on shared learning. That is, we will all participate in the activity of discovery. We will typically work in small teams, and those teams will share what they have learned wit h the rest of the class. Also, because this material is so new, there is no single textbook that can provide us with a syllabus. We will begin with the syllabus that you can find on the course web pages, but this might need to be adjusted as we proceed. As an active participant in the project, you may feel free to make suggestions about making the course more effective. Below are a few of the main milestones and requirements you will complete throughout the semester:
- Lightning Talk (5 mins)
- Project Proposal (1-2 pages written + talk)
- Paper Presentation (20 min talk)
- Project Status Update
- Project Demo
-
Lightning Talks
Each student will be expected to give a 5 minute lightning talk on a paper/system of relevant to the class, a list of which will be provided. The goal of the lightning talk is simply to introduce the system to the class and cover some of the main points/features. It is not intended to be an in depth review of the system, that will be done in the (longer) paper presentations. We are doing the lightning talks simply to introduce a large amount of systems in a short amount of time. We want the presenter to be able to pull out only the relevant information and present it. Here are a few of the main points to consider:
- What problem is this system trying to solve? Think about both the data and the workloads and possibly give an example (no demos!).
- What makes this system different from previous/other systems (if anything)?
- What are the high-level key architectural characteristics?
Again, the purpose of the lightning talks are simply to introduce the system, and only 5 minutes will be allocated for each presentation. Make sure you practice your presentation and are right at the 5 minute mark, as it will be a hard cutoff.
Note that we will not break up into groups / topics until the lightning talks are over. That way each student will have a better sense of what the topics are all about and will be better able to pick an area that interests them.
WARNING: It is acceptable for students to use information and content (e.g., images and graphics) found on the Internet but the original source must be properly attributed/cited. No credit will be given for presentations without proper citations.
-
Paper Presentations
SImilar to the lightning talk, for the paper presentation each student will choose a paper (this time related to their project choice) and present it to the class. However, this talk is supposed to be an in depth description and analysis of the paper. This talk will be 15-20 minutes in length with 5 minutes for questions, a format similar to most conference talks. Because it is the responsibility of the presenter to teach the class about this system, he/she will be expected to know and understand all the aspects of the system. Thus, it is important to be prepared. If you have questions regarding the content of a paper, you should arrange to meet with Justin well in advance of your talk date.
WARNING: It is acceptable for students to use information and content (e.g., images and graphics) found on the Internet but the original source must be properly attributed/cited. No credit will be given for presentations without proper citations.
-
Projects
The main component of this course will be the project. All projects will involve Big Data management and/or analysis using the systems discussed in this class. However, beyond that, the projects will vary greatly in both scope and topic. This will depend on several factors, including group size, group background, and topic. We will discuss this more in depth during class, though you are encouraged to begin to think about projects that interest you now. As is the case with many seminar courses, you will get out of this course what you put into it, so taking the time and coming up with a well-scoped project that lies within the context of this course and interests you will go a long way to your enjoyment of the course.