Brown University

Department of Computer Science

CS227 - Topics in Database Systems

Spring, 2000

Web-based Information Systems

Syllabus

When: Monday, Wednesday - 1:00 to 2:20 PM

Where: CIT, 506

Instructor: Stan Zdonik (sbz@cs.brown.edu)

CIT 523 X7648

Prerequisite: CS127 or equivalent

Introduction

The web has the potential to become the world?s largest database. Data residing on the web today covers most every topic imaginable; however, the dearth of overarching organizational principles make it difficult to realize its true potential. Finding answers to complex questions is labor-intensive at best and often ends up being impossible. Getting a simple, but popular piece of information is sometimes extremely time-consuming. Said another way, while the web contains an enormous amount of data, one would never consider it to be a database system. The database community (among others) has taken an interest in fixing this problem. Many research groups are experimenting with ways to tame the unruly giant. Much of this work is concerned with applying "data management" techniques, much as we did in database systems, to provide scalable solutions in the face of scarce resources.

The Course

This semester CS227 will focus on a new area that we call web-based information systems (WBIS). In a nutshell, we are interested in the problem of extracting useful information from the web as efficiently as possible. This involves linguistic issues (e.g., query language design) as well as systems issues (e.g., data management). We will approach the topic by first studying models of web performance. We then look at recent work on web query languages, and conclude by speculating on how these two areas might be brought together.

The course this year, will be a mixture of readings/presentations and projects. The readings part is intended to introduce some current work in the area, and the projects part is intended to get you involved in some current research questions. As a graduate seminar, the style of the course is much more informal than your typical undergraduate, lecture-style course. You should view this as cooperative learning. To this end, you will work in a small team (2-3 people) on most of the projects in this course. Collaboration is encouraged, but reports that you hand in should be done by yourself unless specifically stated otherwise.

You and your team will be responsible for one or more in-class presentations on the topic of the day. The material presented will be based on assigned readings (book and papers), but need not be limited to them. You will be responsible for presenting enough material so that all members of the class understand the issues, and then you should lead a discussion about related, open (in your opinion) topics. It must be emphasized that these presentations are not meant to be book reports. We are not looking for pure summarization. You should inject some of your own ideas beliefs and questions. In a research setting, good questions are as important as good answers.

Books

This year we are trying an experiment with course materials. In the past, the course was based entirely on recent papers from the literature. This year we will also use two recent books that (we hope) will provide valuable reference material. They are:

[MA] Capacity Planning for Web Performance: Metrics Models and Methods

by Daniel Menasce and Virgilio Almeida, Prentice-Hall, 1998.

[ABS] Data on the Web: From Relations to Semistructured Data and XML

by Serge Abiteboul, Peter Buneman, and Dan Suciu, Morgan-Kaufmann, 2000.

Neither of these books should be thought of as a textbook. They are starting points! Everyone should read the assigned reading from these books. Presenters are encouraged to read papers beyond the books in order to give a presentation with more depth.

Projects

You will also be responsible for a project that will investigate some issue related to the general topic of web-based information systems. You can work individually or in a small group (=2 to 3 students). Groups are encouraged because you are likely to learn more. While the exact topic is up to you, there are four broad categories that you should consider. They are:

  1. Evaluation of a web-query system
  2. Simulation study of a web-based algorithm
  3. Interface to XML query engine
  4. Survey paper (probably best done solo)
We will have more to say about these categories in the near future. Please note that you must hand in a project proposal by March 8.

Schedule

Here is a tentative class schedule. This, of course, is subject to change as we progress. Since this is a highly experimental exercise, we must be flexible. The exact reference for the readings for each session are listed in a separate handout.
 
 






Table 1: Class Schedule


 



 
 
 
 
 
 
 
 
 
Date
Class Topic
Reading
Who
Jan 26 Introduction to the Course     sbz
Jan. 31 Background/Review:

Relational queries: languages and processing

KSS (CS127 text)

3.2; 12.1-12.3; 12.6;18.1-18.3

sbz
Feb. 2 Performance Problems

Client/Server Systems

MA Chapters 1+2 sbz
Feb. 7 Performance of Client/Server Systems MA Chapter 3 sbz
Feb. 9 Web-Servers and Intranets MA Chapter 4 sbz
Feb. 14 Capacity Planning in Client/Server Systems

Workload Characterization

MA Chapters 5+6 Scott, Don
Feb. 16 Benchmarks

System-Level Performance Models

MA Chapters 7+8 Joe, Kongbin
Feb. 21
Presidents Day
   
Feb. 23 Component-Level Performance Models MA Chapter 9 Shi, Yun
Feb 28 Web-Performance Modeling MA Chapter 10 Nesime, Chandler
March 1 Workload Forcasting MA Chapter 11 Amit, Seung, Phillip
March 6 No Class Project Proposals Due  
March 8 Measuring Performance MA Chapter 12 Daniel, Chandler, Mark
March 13 Web data, XML, and new opportunities ABS Chapters 1+2

Paper 1

Dan, Aaron
March 15 Queries on XML Data ABS Chapter 4 Hu, Ying, John
March 20 XML-QL (Supplementary Reading) ABS Chapter 5

Paper 2

Olga, Gregory
March 22 Semistructured data (Supplementary Reading) Paper 3 Don, Scott
March 27
Spring Break
March 29
Spring Break
   
April 3 Interpretation  ABS Chapter 6  Dave, Joe
April 5 Typing Semi-structured data ABS Chapter 7

Paper 5

Dave, Shaoqing, Weijian, Seung
April 10 Typing Semi-structured data
April 12 Query Processing for semistructured data ABS Chapter 8 Daniel, Chandler, Seung, Amit, Phillip, Nesime, Olga
April 17 Query Processing for semistructured data Papers 6+8 Same
April 19 Query Processing for semistructured data Papers 7+9 Same
April 24 LORE (Supplementary Reading) ABS Chapter 9

Paper 4

Dan, Aaron
April 26 Storing Semi-Structured Data Papers 10+11 Greg, Yin, Seung
May 1  NO CLASS
May 3 XML views Papers 12+13 Kongbin, Mark, John
May 8 NiagaraCQ: Continuous Queries  Papers 14+15  Yun, Xiaolan, Gregory
May 10 Project Presentations