Scalable Distributed Data Structures
A New Class of Data Structures for Multicomputers
Witold Litwin
University Paris 9



Multicomputers Commodity computers, interconnected through the high- speed networks are becoming the basic hardware. Such configurations, called multicomputers, clusters, superservers... include dozens, often hundreds or even thousands of clients and servers [GW97]. Their cumulative resources are impressive: dozens of GBytes of distributed RAM, and TBytes of disks accessible for the GMips-highly parallel and distributed processing. They offer potentially unbeatable performance and price/performance ratio, opening up new perspectives for the applications [C94], [T95], [U94]. Major hardware and software makers present multicomputers as the next step of their business strategy. This was, among others, the subject of Microsoft Scalability Day organized in May 1997 for the US professionals [MS97]. The technology is also felt as the next step for the Internet that should evolve from a data providing utility today, into a computing utility [M96], [B97]. One also foresees very large multicomputers coming as new tools for the academia, e.g., a 10.000 node multicomputer for Stanford University in 5-10 years [U94]. Finally, the domain was recently declared as strategic to the US computer science supremacy in 21st century by the US Government, and become the object of nationwide research program PACI [CACM97]. Scalable Distributed Data Structures Multicomputers need new system software, fully taking advantage of the distributed RAM, and of parallel and processing on multiple CPUs, [C94], [G96], [M96]. One especially needs new data structures, allowing files to span over multiple servers, and to scale to as many sites as needed. Such files should reside for processing in the distributed RAM, providing then access performance inaccessible to disk files. They should be accessible to multiple autonomous clients, including the mobile ones. Finally, they should not require any centralized data access computation or directories, too avoid hot-spots.

One solution is a new class of data structures called Distributed Scalable Data Structures (SDDSs). First proposed in 1992-1993, [LNS93], SDDSs gave rise to important research effort, materialized by several algorithms, papers and implementations. It was shown that SDDS files are able to scale to thousands of sites, and terabytes in distributed RAM, with constant access performance, and search times under a millisecond. Multi-key searches requiring an hour or so in a traditional file, e.g., a k-d file, may succeed in less than a second in an SDDS file [LN95], [L96]. All these properties should be of prime importance for the applications, especially in the DBMS design arena, [ASS94]. They open new perspective for the VLDB design, for multimedia databases, for real-time and high-availability databases, for decision support systems, and for high performance computing in general.

Most documents, presentations etc. referred to below, define or present various types of SDDSs. Some present applications of SDDSs or discuss implementation issues, especially under Windows NT. Finally, some references point to the related work, especially major research projects on multicomputers, e.g., the NOW, and Castle project at UCB, [C94], and Legion at U. of Virginia [GW97].

The types of SDDSs known are:

  SDDSs for hashed files. They extend the more traditional dynamic hash data structures, especially the popular linear hashing, [L80], [SPW90], used, e. g., in the Netscape browser, and the dynamic hashing [L88], to the network multicomputers, and switched multicomputers, [D93], [KLR96], [LNS93], [LNS96], [VBWY94]. The basic algorithm for such SDDSs termed LH* becomes popular with the advanced computer literature [K98], [R98].

  SDDSs for ordered files. These extend the traditional ordered data structures, B-trees or binary trees, to the multicomputers, [LNS94], [KW94].

  SDDSs for multi-attribute (multi-key) files. These algorithms extend the traditional k-d data structures [S89] to the multicomputers, [LN95], [N95], [L96]. Performance of multi-attribute search may get improved by several orders of magnitude.

  high-availability SDDSs. These structures have no known counterpart among the traditional data structures. They are designed to transparently survive failures of server sites. They apply principles of mirroring, or of striping, or of grouping of records, revised to support the file scalability [LN95], [LN96], [LR97], [LMR98].

The implementation issues of the SDDSs are investigated in detail for Windows NT and a 100 Mb/s local net. In [S95], one proposes an experimental protocol for SDDS clients and servers. A software architectures for an SDDS prototype manager under Windows NT, built at Paris 9, is described in [B96], [P97], and [A98]. In [KLR96], one describes an implementation at the Parsytec switched multicomputer with 128 PowerPCs and 2 GB of RAM. Problems of concurrency control, transaction management, and query optimization issues for SDDSs are addressed in [ASS94] and [TZK96].   A high-availability variant of LH* implemented in Java is presented in [L98]. Finally, in [R98a], one discusses an application of another variant of LH* to telecommunication databases.
[A97] Aly, W., D. Organisation interne des SDDS (RP*). Mémoire de DEA. U. Dakar, 1998.
[ASS94] Amin, M., Schneider, D., and Singh, V., An Adaptive, Load Balancing Parallel Join Algorithm. 6th International Conference on Management of Data, Bangalore, India, December, 1994.
[B96] Bennour, F. Une Architecture pour un Gestionnaire des SDDS sous Windows NT. Mémoire de DEA. U. Paris 9, 1996.
[B97] Birnbaum, J. ACM97 Speakers Corner. Comm. of ACM, Jan. 1997.
[C94] Culler, D. NOW: Towards Everyday Supercomputing on a Network of Workstations. EECS Tech. Rep. UC Berkeley.
[CACM97] Comm. Of ACM. Special Issue on High-Performance Computing.(Oct. 1997).
[D93] Devine, R. Design and Implementation of DDH: Distributed Dynamic Hashing. Int. Conf. On Foundations of Data Organizations, FODO-93. Lecture Notes in Comp. Sc., Springer-Verlag (publ.), Oct. 1993.
[G96] Gray, J. Super-Servers: Commodity Computer Clusters Pose a Software Challemge. Microsoft, 1996.
[GW96] Grimshaw, A, Wulf, W. The Legion Vision of a Worldwide Virtual Computer. Comm. of ACM, Jan. 1997.
[KLR96] Karlsson, J. Litwin, W., Risch, T. LH*lh: A Scalable High Performance Data Structure for Switched Multicomputers. Intl. Conf. on Extending Database Technology, EDBT-96, Avignon, March 1996.
[KW94] Kroll, B., Widmayer, P. Distributing a Search Tree Among a Growing Number of Processors. ACM-SIGMOD Int. Conf. On Management of Data, 1994.
[K98] Knuth D. The Art of Computer Programming. 3rd Ed. Addison Wesley, 1998.
[L80] Linear Hashing : a new tool for file and tables addressing. Reprinted from VLDB-80 in READINGS IN DATABASES. 2-nd ed. Morgan Kaufmann Publishers, Inc., 1994. Stonebraker , M.(Ed.).
[L88] Larson, P. Dynamic Hash Tables, CACM, 31 (4), 1988.

[L98] Lindberg., R. A Java Implementation of a Highly Available Scalable and Distributed Data Structure LH*g. Master Th. LiTH-IDA-Ex-97/65. U. Linkoping, 1997, 62.

[LNS93] Litwin, W. Neimat, M-A., Schneider, D. LH* : Linear Hashing for Distributed Files. ACM-SIGMOD Intl. Conf. On Management of Data, 1993.
[LNS94] Litwin, W., Neimat, M-A., Schneider, D. RP* : A Family of Order-Preserving Scalable Distributed Data Structures. 20th Intl. Conf on Very Large Data Bases (VLDB), 1994.
[LN95] Litwin, W., Neimat. k-RP* : a Family of High Performance Multi-attribute Scalable Distributed Data Structure. IEEE Intl. Conf. on Par. & Distr. Systems, PDIS-96, (Dec. 1996).
[LN95] W. Litwin, M-A Neimat. LH*s : a high-availability and high-security Scalable Distributed Data Structure. IEEE Workshop on Research Issues in Data Engineering. IEEE Press, 1997.
[LN96] Litwin, W., Neimat M-A. High-Availability LH* Schemes with Mirroring. With M-A, Neimat. Intl. Conf. on Coope. Inf. Syst. COOPIS-96, Brussels, 1996.
[LNS96] Litwin, W., Neimat, M-A., Schneider, D. LH*: A Scalable Distributed Data Structure. ACM Transactions on Database Systems ACM-TODS, (Dec. 1996).
[LR97] Litwin, W. Risch, T. LH*g : a high-availability Scalable Distributed Data Structure through record grouping. U-Paris 9 Tech. Rep. (May, 1997). Submitted for publ.
[LMR98] Litwin, W. Menon, J., Risch, T. LH* Schemes with Scalable Availability. IBM Almaden Research Rep. (May, 1998).
[L96] Lomet, D. Replicated Indexes for Distributed Data to app. in IEEE Intl. Conf. on Par. & Distr. Systems, PDIS-96, (Dec. 1996).
[M96] Microsoft Windows NT Server Cluster Strategy: High Availability and Scalability with Industry- Standard Hardware. A White Paper from the Business Systems Division. Microsoft, 1996.
[MS97] Microsoft Scalability Day: Coverage.
[N95]Nardelli, E. Distributed Searching of Multi-dimensional Data: A Performance Evaluation Study. Journal of Par. & Distr. Computing 49, 11-134, 1998.
[P97] Pheng, P. Gestion d'une case de LH*. Mémoire de DEA. U. Paris 9, 1997.
[R98] Ramakrishnan, K. Database Management Systems. McGraw Hill, 1998.
[R98a] Ronstrom, M. Design and Modelling of a Parallel Data Server for Telecom Applications. Ph.D, Thesis U. Linkoping, 1998, 250.
[S89] Samet, H. The Design and Analysis of Spatial Data Structures. Addison-Wesley, 1989.
[SPW90] Severance, C., Pramanik, S. Wolberg, P. Distributed linear hashing and parallel projection in main memory databases. VLDB-90.
[S95] Souleiman, R. Communication Protocol for the Scalable Distributed Data Structures. Tech. Rep. Univ. Paris 9, 1995 (in French)
[T95] Tanenbaum, A., S. Distributed Operating Systems. Prentice Hall, 1995, 601.
[TZK96] Tung, S, Zha, H, Kefe, T. Concurrent Scalable Distributed Data Structures, Proceedings of the ISCA Intl. Conf. on Parallel and Distributed Computing Systems , pp. 131-136, Dijon, France, (Sept., 1996). Edited by K. Yetongnon and S. Harini
[VBWY94] Vingralek, R., Breitbart, Y., Weikum, G. Distributed File Organization with Scalable Cost/Performance. ACM-SIGMOD Int. Conf. On Management of Data, 1994.
[U94] Ullman, J. New Frontiers in Database System Research. Future Tendencies in Computer Science, Control, and Applied Mathematics. Lecture Notes in Computer Science 653, Springer-Verlag, 1994. A. Bensoussan, J. P. Verjus, ed. 87-101.