Department of Computer Science
CS227 - Topics in Database Systems
Spring, 2000
Web-based Information Systems
Syllabus
Where: CIT, 506
Instructor: Stan Zdonik (sbz@cs.brown.edu)
CIT 523 X7648
Prerequisite: CS127 or equivalent
The web has the potential to become the world?s largest database. Data residing on the web today covers most every topic imaginable; however, the dearth of overarching organizational principles make it difficult to realize its true potential. Finding answers to complex questions is labor-intensive at best and often ends up being impossible. Getting a simple, but popular piece of information is sometimes extremely time-consuming. Said another way, while the web contains an enormous amount of data, one would never consider it to be a database system. The database community (among others) has taken an interest in fixing this problem. Many research groups are experimenting with ways to tame the unruly giant. Much of this work is concerned with applying "data management" techniques, much as we did in database systems, to provide scalable solutions in the face of scarce resources.
The Course
This semester CS227 will focus on a new area that we call web-based information systems (WBIS). In a nutshell, we are interested in the problem of extracting useful information from the web as efficiently as possible. This involves linguistic issues (e.g., query language design) as well as systems issues (e.g., data management). We will approach the topic by first studying models of web performance. We then look at recent work on web query languages, and conclude by speculating on how these two areas might be brought together.
The course this year, will be a mixture of readings/presentations and projects. The readings part is intended to introduce some current work in the area, and the projects part is intended to get you involved in some current research questions. As a graduate seminar, the style of the course is much more informal than your typical undergraduate, lecture-style course. You should view this as cooperative learning. To this end, you will work in a small team (2-3 people) on most of the projects in this course. Collaboration is encouraged, but reports that you hand in should be done by yourself unless specifically stated otherwise.
You and your team will be responsible for one or more in-class presentations on the topic of the day. The material presented will be based on assigned readings (book and papers), but need not be limited to them. You will be responsible for presenting enough material so that all members of the class understand the issues, and then you should lead a discussion about related, open (in your opinion) topics. It must be emphasized that these presentations are not meant to be book reports. We are not looking for pure summarization. You should inject some of your own ideas beliefs and questions. In a research setting, good questions are as important as good answers.
Books
This year we are trying an experiment with course materials. In the past, the course was based entirely on recent papers from the literature. This year we will also use two recent books that (we hope) will provide valuable reference material. They are:
by Daniel Menasce and Virgilio Almeida, Prentice-Hall, 1998.
[ABS] Data on the Web: From Relations to Semistructured Data and XML
by Serge Abiteboul, Peter Buneman, and Dan Suciu, Morgan-Kaufmann, 2000.
Projects
You will also be responsible for a project that will investigate some issue related to the general topic of web-based information systems. You can work individually or in a small group (=2 to 3 students). Groups are encouraged because you are likely to learn more. While the exact topic is up to you, there are four broad categories that you should consider. They are:
Schedule
Here is a tentative class schedule. This, of course, is subject to change
as we progress. Since this is a highly experimental exercise, we must be
flexible. The exact reference for the readings for each session are listed
in a separate handout.
Table 1: Class Schedule
|
|
|
|
|
Jan 26 | Introduction to the Course | sbz | ||
Jan. 31 | Background/Review:
Relational queries: languages and processing |
KSS (CS127 text)
3.2; 12.1-12.3; 12.6;18.1-18.3 |
sbz | |
Feb. 2 | Performance Problems
Client/Server Systems |
MA Chapters 1+2 | sbz | |
Feb. 7 | Performance of Client/Server Systems | MA Chapter 3 | sbz | |
Feb. 9 | Web-Servers and Intranets | MA Chapter 4 | sbz | |
Feb. 14 | Capacity Planning in Client/Server
Systems
Workload Characterization |
MA Chapters 5+6 | Scott, Don | |
Feb. 16 | Benchmarks
System-Level Performance Models |
MA Chapters 7+8 | Joe, Kongbin | |
Feb. 21 |
|
|||
Feb. 23 | Component-Level Performance Models | MA Chapter 9 | Shi, Yun | |
Feb 28 | Web-Performance Modeling | MA Chapter 10 | Nesime, Chandler | |
March 1 | Workload Forcasting | MA Chapter 11 | Amit, Seung, Phillip | |
March 6 | No Class | Project Proposals Due | ||
March 8 | Measuring Performance | MA Chapter 12 | Daniel, Chandler, Mark | |
March 13 | Web data, XML, and new opportunities | ABS Chapters 1+2
Paper 1 |
Dan, Aaron | |
March 15 | Queries on XML Data | ABS Chapter 4 | Hu, Ying, John | |
March 20 | XML-QL (Supplementary Reading) | ABS Chapter 5
Paper 2 |
Olga, Gregory | |
March 22 | Semistructured data (Supplementary Reading) | Paper 3 | Don, Scott | |
March 27 |
|
|||
March 29 |
|
|||
April 3 | Interpretation | ABS Chapter 6 | Dave, Joe | |
April 5 | Typing Semi-structured data | ABS Chapter 7
Paper 5 |
Dave, Shaoqing, Weijian, Seung | |
April 10 | Typing Semi-structured data | |||
April 12 | Query Processing for semistructured data | ABS Chapter 8 | Daniel, Chandler, Seung, Amit, Phillip, Nesime, Olga | |
April 17 | Query Processing for semistructured data | Papers 6+8 | Same | |
April 19 | Query Processing for semistructured data | Papers 7+9 | Same | |
April 24 | LORE (Supplementary Reading) | ABS Chapter 9
Paper 4 |
Dan, Aaron | |
April 26 | Storing Semi-Structured Data | Papers 10+11 | Greg, Yin, Seung | |
May 1 | NO CLASS | |||
May 3 | XML views | Papers 12+13 | Kongbin, Mark, John | |
May 8 | NiagaraCQ: Continuous Queries | Papers 14+15 | Yun, Xiaolan, Gregory | |
May 10 | Project Presentations |