Papers:
ETL Systems
- A Survey of Extract-Transform-Load Technology. Vassiliadis, Panos. IJDWM5.3 (2009): 1-27.
- TPC-DI: The First Industry Benchmark for Data Integration Poess, Meikel, et al. PVLDB 7(13), Aug 2014.
- The Pathologies of Big Data. Jacobs, Adam. Communications of the ACM52.8 (2009): 36-44.
- Engineering Trade Study: Extract, Transform, Load Tools for Data Migration Henry, Sebastien, et al. Systems and Information Engineering Design Symposium, IEEE, 2005.
Streaming
- Data Stream Management Issues: A SurveyGolab, Lukasz, and M. Tamer Ozsu. Technical Report, Apr. 2003. db. uwaterloo.
- S-Store: Streaming Meets Transaction ProcessingMeehan, John, et al. PVLDB 8(13), Sept 2015
Schema Mapping & Semantic Heterogeneity
- Schema Mapping as Query DiscoveryMiller, Renée J., Laura M. Haas, and Mauricio A. Hernández. VLDB. Vol. 2000. 2000.
- Data Curation at Scale: The Data Tamer System.Stonebraker, Michael, et al. CIDR. 2013.
- Semantic Heterogeneity Resolution in Federated Databases by Metadata Implantation and Stepwise Evolution.Aslan, Goksel, and Dennis McLeod. The VLDB Journal—The International Journal on Very Large Data Bases 8.2 (1999): 120-132.
- Uncertain Schema Matching
Gal, Avigdor. Synthesis Lectures on Data Management. Morgon and Claypool. 2011.
Data Cleaning
- Data Cleaning: Problems and Current Approaches.Rahm, Erhard, and Hong Hai Do. IEEE Data Eng. Bull. 23.4 (2000): 3-13.
- Potter's Wheel: An Interactive Data Cleaning SystemRaman, Vijayshankar, and Joseph M. Hellerstein. VLDB. Vol. 1. 2001.
- A Primitive Operator for Similarity Joins in Data CleaningChaudhuri, Surajit, Venkatesh Ganti, and Raghav Kaushik. Data Engineering, 2006. ICDE'06. Proceedings of the 22nd International Conference on. IEEE, 2006.
- Data Cleaning: A Practical PerspectiveGanti, Venkatesh, and Anish Das Sarma. Synthesis Lectures on Data Management. Morgon and Claypool. September 2013.
De-Duplication
- Data Deduplication TechniquesHe, Qinlu, Zhanhuai Li, and Xiao Zhang. Future Information Technology and Management Engineering (FITME), 2010 International Conference on. Vol. 1. IEEE, 2010.
- Demystifying Data Deduplication Mandagere, Nagapramod, et al. Proceedings of the ACM/IFIP/USENIX Middleware'08 Conference Companion. ACM, 2008.
- A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication.Christen, Peter. Knowledge and Data Engineering, IEEE Transactions on24.9 (2012): 1537-1555.
- Data Curation at Scale: The Data Tamer System. Stonebraker, Michael, et al. CIDR. 2013.
- Engineering Crowdsourced Stream Processing Systems. Imran, Muhammad, et al. arXiv preprint arXiv:1310.5463 (2013).
- An Introduction to Duplicate Detection
Naumann, Felix, and Melanie Herschel. Synthesis Lectures on Data Management. Morgon and Claypool. 2010.
Data Loading incl. Bulk Load
- Optimizing Data Warehouse Loading Procedures for Enabling Useful-Time Data Warehousing. Santos, Ricardo Jorge, and Jorge Bernardino. Proceedings of the 2009 International Database Engineering & Applications Symposium. ACM, 2009.
- Optimized Data Loading for a Multi-Terabyte Sky Survey Repository. Cai, Y. Dora, Ruth Aydt, and Robert J. Brunner. Proceedings of the 2005 ACM/IEEE conference on Supercomputing. IEEE Computer Society, 2005.
- Transaction Reordering and Grouping for Continuous Data Loading. Luo, Gang, et al. Business Intelligence for the Real-Time Enterprises. Springer Berlin Heidelberg, 2007. 34-49.
- Improving the Query Performance of High-Dimensional Index Structures by Bulk Load Operations. Berchtold, Stefan, Christian Böhm, and Hans-Peter Kriegel. Advances in Database Technology—EDBT'98. Springer Berlin Heidelberg, 1998. 216-230.
NoDB
- NoDB in Action: Adaptive Query Processing on Raw Data. Alagiannis, Ioannis, et al. Proceedings of the VLDB Endowment 5.12 (2012): 1942-1945.
- NoDB: Efficient Query Execution on Raw Data Files Alagiannis, Ioannis, et al. Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM, 2012.
- Adaptive Partitioning and Indexing for Raw Data Querying Olma, Matthaios.
Federated Databases
- Federated Databases and Systems: Part I—A Tutorial on Their Data Sharing. Hsiao, David K. The VLDB Journal 1.1 (1992): 127-179.
- Protocols for Integrity Constraint Checking in Federated Databases. Grefen, Paul, and Jennifer Widom. Distributed and Parallel Databases 5.4 (1997): 327-355.
- Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases. Sheth, Amit P., and James A. Larson. ACM Computing Surveys (CSUR) 22.3 (1990): 183-236.
- Asterix: Scalable Warehouse-Style Web Data Integration. Alsubaiee, Sattam, et al. Proceedings of the Ninth International Workshop on Information Integration on the Web. ACM, 2012.
Publish / Subscribe
- The Many Faces of Publish/Subscribe. Eugster, Patrick Th, et al. ACM Computing Surveys (CSUR) 35.2 (2003): 114-131.
- Design Considerations for High Fan-In Systems: The HiFi Approach.Franklin, Michael J., et al. CIDR. 2005.
- HiFi: A Unified Architecture for High Fan-in SystemsCooper, Owen, et al. Proceedings of the Thirtieth international conference on Very large data bases-Volume 30. VLDB Endowment, 2004.
ECA Rules
- Developing Event-Condition-Action Rules in Real-Time Active DatabaseQiao, Ying, et al. Proceedings of the 2007 ACM symposium on Applied computing. ACM, 2007.
- The Architecture of an Active Database Management SystemMcCarthy, Dennis, and Umeshwar Dayal. ACM Sigmod Record. Vol. 18. No. 2. ACM, 1989.