SEMINAIRE

Université Paris-Dauphine
Place du Maréchal de Lattre de Tassigny - Paris 16éme
Métro: Porte Dauphine - RER Ligne C: Avenue Foch -
BUS PC: Porte Dauphine

Mercredi 14 novembre 2001, 15h30

Salle A 703, 7ème étage, Bat. A ( Nouvelle aile du bâtiment de l'université)

 

Evaluation of Join Strategies for Distributed Mediation

by

Vanja Josifovski, Timour Katchanouov, Tore Risch

 

 

 

Three join algorithms is described and evaluated for an environment composed of distributed main-memory based mediators and data sources. First a streamed ship-out join is presented where bulks of tuples are shipped to a mediator close to a data source, followed by post-processing in the client mediator. The second join algorithm is an extended streamed semi-join that in addition incrementally builds a main-memory hash index in the client mediator. The third is a ship-in algorithm where the data is materialized in the client mediator before the join is executed there. The first two algorithms are suitable for sources that require parameters to execute a query, as web search engines and computational software. The last algorithm is used with sources not supporting parameterized queries. For the algorithms we compare the execution times for obtaining the last and the first N tuples and analyze the portion of the time spent in the different subsystems. We varied the speed of the communication

network, bulk size, duplicates in shipped data, and the size of the mediator's main memory. The study shows that the choice of a join algorithm can lead to orders of magnitude difference in the execution times in different mediation environments.