Nos tutelles

CNRS Dauphine PSL *



Intenrship on Schema inference for massive JSON data sets

publié le

Brief description of subject : In this internship project, in order to tackle the problem of efficiently inferring schemas from huge JSON collections we aim at using Spark, a recent system enabling general-purpose, large-scale data processing. Spark allows for running programs written in the Scala programming language, which is particularly suitable for symbolic manipulations performed by schema inference algorithms. Also, Spark outperforms Hadoop-MapReduce in many contexts, and we expect that this holds in our setting. Particular attention will be dedicated to the problem of inferring multiple schemas at different levels of precision, and let the user to decide the preferred precision level by, interactively, while exploring the data sets.

Link to details :

Duration : 5-6 mois

Lead by : Dario Colazzo

E-mail :

Web page :

Laboratory/Host Organisation : LAMSADE

Remarks : (Co-leader : Carlo Sartiani, Università della Basilicata (Italy))