Le Master IASD commence par un semestre de tronc commun sur les  disciplines de l’IA et des sciences des données. À la fin du premier semestre, les étudiants doivent choisir une série de cours d’approfondissement pour le second semestre, parmi une large sélection d’options. (Chaque étudiant doit choisir au minimum 6 options.) L’année termine par un stage effectué dans un laboratoire de recherche académique ou industriel qui se conclut par la rédaction d’un mémoire courant septembre.

Tronc commun

Bases de données avancées (SBGD non classiques)
Advanced databases (non-classical DBMSs)
Nombre d'heures
48
ECTS
6 crédits
Université responsable de l'enseignement
Université Paris-Dauphine, Université PSL
Responsable(s)
Dario Colazzo

Note : la description de ce cours n'est pas disponible en Français.

The first goal of the course is to enable students to acquire strong skills for the efficient querying of relational databases. The second goal of the course is to present foundations and advanced techniques supporting systems for semi-structured data processing. it is well known that in the context of data processing for IA applications, a large part of the effort is devoted to data preparation, a process that strongly depends on techniques and skills for formulating complex queries and tuning systems supporting their execution, in order to ensure reasonable query execution time. In a wide range of use cases, data preparation involves either structured or semi-structured datasets. In this context, the first goal of the course is to enable students to acquire strong skills for the efficient querying of relational databases. After an initial refresh about the standard query language and the design of complex queries, the course will present optimization techniques that relational database management systems adopt in order to ensure efficient querying. The attention will be particularly given to storage and indexing techniques, as well as algorithms for the generation of efficient query execution plans, and main approach for database tuning. How these techniques are transposed in modern systems of the big data ecosystem will be also discussed.
The second goal of the course is to present foundations and advanced techniques supporting systems for semi-structured data processing. The attention is first focused formal specification of query languages, the design of complex queries as well as recent techniques for static analysis that help the user in the design of correct queries, in a context where this task is particularly difficult due to the potentially high and unpredictable variability in the structure of the datasets. Both parts will be supported by both books and scientific articles published in main conferences and journal on databases. The acquired notions will be consolidated in several lab-sessions.

Bibliographie, lectures recommandées

Database Management Systems, Third Edition,
Raghu Ramakrishnan, Johannes Gehrke.
Mac Graw Hill
Relational DBMS internals.
Antonio Albano, Dario Colazzo, Giorgio Ghelli, Enzo Orsini.
Book available in pdf.

Fondamentaux de l'apprentissage automatique
Machine learning fundamentals
Nombre d'heures
48
ECTS
6 crédits
Université responsable de l'enseignement
Université Paris-Dauphine, Université PSL
Responsable(s)
Yann Chevaleyre

Note : la description de ce cours n'est pas disponible en Français.

The aim of this course is to provide the students with the fundamental concepts and tools for developing and analyzing machine learning algorithms. The course will introduce the theoretical foundations of machine learning, review the most successful algorithms with their theoretical guarantees, and discuss their application in real world problems. The covered topics are: — Introduction to the different paradigms of ML and applications

  • Bayes rule, MLE, MAP
  • the fully bayesian setting
  • Computational learning theory
  • Empirical Risk Minimization
  • Universal consistency
  • ERM and ill-posed problems
  • biais-variance tradeoff
  • Regularization
  • PAC model and Sample complexity
  • MDL and Sample compression bounds
  • VC-dimension and Sauer’s lemma
  • Radermacher complexity
  • Overfitting and Regularization
  • SRM
  • Online learning
  • Model selection, cross validation
  • Supervised learning
  • knn, naive bayes
  • Logistic regression and beyond
  • Perceptron
  • kernelized perceptron and SVM
  • Kernel methods
  • Decision trees and Random Forests
  • Multiclass and ranking algorithms
  • Unsupervised learning
  • Dimensionality reduction: PCA, ICA, Kernel PCA, ISOMAP, LLE
  • Density estimation
  • EM
  • mixtures of gaussians
  • Spectral clustering
  • Ensemble methods: bagging, boosting, gradient boosting

    Bibliographie, lectures recommandées

  •  Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2012). Foundations of machine learning. MIT press.
  • Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding machine learning: From theory to algorithms. Cambridge university press.
  • Vapnik, V. (2013). The nature of statistical learning theory. Springer science & business media.
  • Bishop Ch. (2006). Pattern recognition and machine learning. Springer – Friedman, J., Hastie, T., & Tibshirani, R. (2001). The elements of statistical learning (Vol. 1, No. 10). New York, NY, USA:: Springer series in statistics.
  • James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112). New York: springer.

 

Optimisation pour l'apprentissage automatique
Optimization for machine learning
Nombre d'heures
48
ECTS
6 crédits
URL du cours
https://bit.ly/3312oy5
Université responsable de l'enseignement
ENS, Université PSL
Responsable(s)
Gabriel Peyré

Note : la description de ce cours n'est pas disponible en Français.

This course will review the mathematical foundations for Machine Learning, as well as the underlying algorithmic methods and showcases some modern applications of a broad range of optimization techniques.

Optimization is at the heart of most recent advances in machine learning. This includes of course most basic methods (linear regression, SVM and kernel methods). It is also the key for the recent explosion of deep learning which are state of the art approaches to solve supervised and unsupervised problems in imaging, vision and natural language processing.

This course will review the mathematical foundations, the underlying algorithmic methods and showcases some modern applications of a broad range of optimization techniques. The course will be composed of both classical lectures and numerical sessions in Python. The first part covers the basic methods of smooth optimization (gradient descent) and convex optimization (optimality condition, constrained optimization, duality). The second part will features more advanced methods (non-smooth optimization, SDP programming,interior points and proximal methods). The last part will cover large scale methods (stochastic gradient descent), automatic differentiation (using modern python framework) and their application to neural network (shallow and deep nets).

Location:

Lectures will not be at Université Paris-Dauphine, but at ENS, 29 rue d’Ulm, in the 5th district of Paris. More precisely, lectures will be:

  • in room U209 on Tuesdays (except for November 19, room U207)
  • in room Paul Langevin on Thursdays
Lecturers:

– Vincent Duval (INRIA)
– Robert Gower (Telecom Paris)
– Gabriel Peyré (CNRS et ENS)
– Clément Royer (Dauphine)
– Alessandro Rudi (INRIA)
– Irene Waldspurger (CNRS et Dauphine)

References:

Theory and algorithms:

Convex Optimization, Boyd and Vandenberghe
Introduction to matrix numerical analysis and optimization, Philippe Ciarlet
Proximal algorithms, N. Parikh and S. Boyd
Introduction to Nonlinear Optimization – Theory, Algorithms and Applications, Amir Beck

Numerics:

Pyrthon and Jupyter installation: use only Python 3 with Anaconda distribution.
The Numerical Tours of Signal Processing, Gabriel Peyré
Scikitlearn tutorial #1 and Scikitlearn tutorial #2, Fabian Pedregosa, Jake VanderPlas
Reverse-mode automatic differentiation: a tutorial
Convolutional Neural Networks for Visual Recognition
Christopher Olah, Blog

Projet science des données
Data science project
Nombre d'heures
24
ECTS
4 crédits
URL du cours
https://www.lamsade.dauphine.fr/~bnegrevergne/ens/ProjetDataScience/
Université responsable de l'enseignement
Université Paris-Dauphine, Université PSL
Responsable(s)
Benjamin Negrevergne

Note : la description de ce cours n'est pas disponible en Français.

The goal of this module is to provide students with a hands-on experience on a novel data-science/AI challenge using state-of-the-art tools and techniques discussed during other classes of this master. Students enrolled in this class will form groups and choose one topic among a list of proposed topics in the core areas of the master such as supervised or unsupervised learning, recommendation, game AI, distributed or parallel data-science, etc. The topics will generally consist in applying a well-established technique on a novel data-science challenge or in applying recent research results on a classical data-science challenge. Either way, each topic will come with its own novel scientific challenge to address. At the end of the module, the students will give an oral presentation to demonstrate their methodology and their findings. Strong scientific rigor as well as very good engineering and communication skills will be necessary to complete this module successfully.

Apprentissage profond
Deep learning
Nombre d'heures
24
ECTS
4 crédits
URL du cours
https://www.lamsade.dauphine.fr/~cazenave/DeepLearningProject.html
Université responsable de l'enseignement
Université Paris-Dauphine, Université PSL
Responsable(s)
Tristan Cazenave

Note : la description de ce cours n'est pas disponible en Français.

Introduction to Deep Learning. Deep Learning enable to train neural network with many layers so as to address various difficult problems. Applications range from image to games. In this course we will present Stochastic Gradient Descent for deep neural networks using different architectures (convolutions, dense, recurrent, residual). We will use Keras/Tensorflow and/or Pytorch and apply them to games and optimization.

References:

Deep Learning avec TensorFlow – Mise en oeuvre et cas concrets – 22 novembre 2017
de Aurélien Géron, O’Reilly. Keras Documentation : https://keras.io/ Pytorch Documentation : https://pytorch.org/

Représentation des connaissances, raisonnement, planification
Knowledge representation, planning, and reasoning
Nombre d'heures
24
ECTS
4 crédits
Université responsable de l'enseignement
Université Paris-Dauphine, Université PSL
Responsable(s)
Jérôme Lang Gabriella Pigozzi

Note : la description de ce cours n'est pas disponible en Français.

The course introduces techniques for representing and reasoning over knowledge information. 1. Reasoning about Belief, Knowledge, and Preferences

– plausible and nonmonotonic reasoning
– reasoning about belief and knowledge (single-and multiple-agent), belief change
– case-based reasoning, analogical reasoning
– preference languages, reasoning about preferences
– reasoning and decision under uncertainty, graphical models

2. Reasoning about Action and Planning

– reasoning about action, action languages for planning
– algorithms for classical planning and hierarchical planning
– planning under uncertainty
– multi-agent planning
– planning and search

Options

Semaines intensives PSL (Data@PSL)
PSL Intensive Weeks (Data@PSL)
Nombre d'heures
30
ECTS
3 crédits
URL du cours
https://data-psl.github.io/intensive-week/
Université responsable de l'enseignement
Université Paris-Dauphine, Université PSL
Responsable(s)
Alexandre Allauzen
Apprentissage automatique avancé
Advanced machine learning
Nombre d'heures
24
ECTS
3 crédits
Université responsable de l'enseignement
Université Paris-Dauphine, Université PSL
Responsable(s)
Yann Chevaleyre

Note : la description de ce cours n'est pas disponible en Français.

This research-oriented module will focus on advanced machine learning algorithms Probabilistic and Bayesian Machine Learning
Bayesian linear regression
Gaussian Processes (i.e. kernelized Bayesian linear regression)
Approximate Bayesian Inference
Latent Dirichlet Allocation
Beyond the supervised/unsupervised learning problems
Semi-supervised learning
Density-gap based methods
Manifold based methods
Active learning
Selective sampling
Disagreement region coefficient
Sparse models
Advanced Deep learning Techniques
Generative Learning
Denoising auto-encoders
Variational Auto-Encoders
GANs
Learning representations
Word2vec, graph embeddings, …
Randomized algorithms
Random projections, LSH
Primal methods for kernel classifiers (random kitchen sinks)

Anonymisation, confidentialité
Anonymization, privacy
Nombre d'heures
24
ECTS
3 crédits
URL du cours
https://moodle.ens.psl.eu/course/view.php?id=1036
Université responsable de l'enseignement
ENS, Université PSL
Responsable(s)
Pierre Senellart

Note : la description de ce cours n'est pas disponible en Français.

  • Basics of data privacy and anonymization
  • Measuring anonymity: k-anonymity, l-diversity, m-closeness
  • A probabilistic framework: differential privacy
  • Differentially private mechanisms
  • Local differential privacy
  • Differential privacy and federated learning
  • Processing data anonymously, homomorphic encryption
Choix social computationnel
Computational social choice
Nombre d'heures
24
ECTS
3 crédits
Université responsable de l'enseignement
Université Paris-Dauphine, Université PSL
Responsable(s)
Jérôme Lang

Note : la description de ce cours n'est pas disponible en Français.

The aim of this course is to give an overview of the problems, techniques and applications of computational social choice, a multidisciplinary topic at the crossing point of computer science (especially artificial intelligence, operations research, theoretical computer science, multi-agent systems, computational logic, web science) and economics. The course consists of the analysis of problems arising from the aggregation of preferences of a group of agents from a computational perspective. On the one hand, it is concerned with the application of techniques developed in computer science, such as complexity analysis or algorithm design, to the study of social choice mechanisms, such as voting procedures or fair division algorithms. On the other hand, computational social choice is concerned with importing concepts from social choice theory into computing. For instance, social welfare orderings originally developed to analyse the quality of resource allocations in human society are equally well applicable to problems in multi-agent systems or network design. The course will focus on normative aspects, computational aspects, and real-world applications (including some case studies). Program: 1. Introduction to social choice. 2. Computing hard voting rules and preference aggregation functions. Application to aggregating web page rankings. 3. Strategic issues: manipulation, control, game-theoretic analyses of voting. Short introduction to algorithmic mechanism design. 4. Preference aggregation on combinatorial domains. 5. Communication issues in voting: voting with incomplete preferences, elicitation protocols, communication complexity, low-communication mechanisms. 6. Fair division of indivisible goods. 7. Cake cutting algorithms 8. Matching under preferences 9. Coalition formation. 10. Specific applications and case studies (varying every year): rent division, kidney exchange, school assignment, group recommendation systems…

Bibliographie, lectures recommandées

Handbook of Computational Social Choice (F. Brandt, V. Conitzer, U. Endriss, J. Lang, A. Procaccia, eds.), Cambridge University Press, 2016. Algorithmics of Matching Under Preferences (D. Manlove), World Scientific, 2013.

Data wrangling, qualité des données
Data wrangling, data quality
Nombre d'heures
24
ECTS
3 crédits
URL du cours
https://moodle.ens.psl.eu/course/view.php?id=1041
Université responsable de l'enseignement
ENS, Université PSL
Responsable(s)
Leonid Libkin Pierre Senellart

Note : la description de ce cours n'est pas disponible en Français.

  • Information extraction
  • Data cleaning, data deduplication
  • Data integration, view-based query answering
  • Provenance management
  • Data exchange
  • Probabilistic databases
  • Incomplete information in databases
  • Approximate query answering
Apprentissage profond pour l'analyse d'images
Deep learning for image analysis
Nombre d'heures
24
ECTS
3 crédits
URL du cours
http://cours.cmm.mines-paristech.fr/wiki/doku.php/deep/start
Université responsable de l'enseignement
Mines ParisTech, Université PSL
Responsable(s)
Étienne Decencière

Note : la description de ce cours n'est pas disponible en Français.

Deep learning has achieved formidable results in the image analysis field in recent years, in many cases exceeding human performance. This success opens paths for new applications, entrepreneurship and research, while making the field very competitive. This course aims at providing the students with the theoretical and practical basis for understanding and using deep learning for image analysis applications. Program to be followed The course will be composed of lectures and practical sessions. Moreover, experts from industry will present practical applications of deep learning.

Lectures will include:

• Artificial neural networks, back-propagation algorithm
• Convolutional neural network
• Design and optimization of a neural architecture
• Successful architectures (AlexNet, VGG, GoogLeNet, ResNet)
• Analysis of neural network function
• Image classification and segmentation
• Auto-encoders and generative networks
• Current research trends and perspectives During the practical sessions, the students will code in Python, using Keras and Tensorflow. They will be confronted with the practical problems linked to deep learning: architecture design; optimization schemes and hyper-parameter selection; analysis of results. Prerequisites: Linear algebra, basic probability and statistics

Apprentissage par renforcement profond et applications
Deep reinforcement learning and applications
Nombre d'heures
24
ECTS
3 crédits
Université responsable de l'enseignement
Université Paris-Dauphine, Université PSL
Responsable(s)
Eric Benhamou
Éthique et intelligence artificielle
Ethics and artificial intelligence
Nombre d'heures
24
ECTS
3 crédits
URL du cours
http://caor-mines-paristech.fr/fr/cours-eia/
Université responsable de l'enseignement
Mines ParisTech, Université PSL
Responsable(s)
François Goulette

Note : la description de ce cours n'est pas disponible en Français.

The course will be the occasion, for future data scientists, and for students in general, to question the benefits and risks of science. Data science is becoming a key technology in many sciences from hard sciences to humanities, as well as in industry from transportation to journalism. However, this technology may also have negative consequences when important decisions are taken based on biases in the data, or poorly-realized computations. As citizens, we want to get the advantages of data science, e.g., in medicine or law, but, for that, we don’t want to abandon the values of our society, the freedom of citizens and the right to have a private life, the fairness of administrative processes, etc. The course will be the occasion, for future data scientists, and for students in general, to question the benefits and risks of science. It will permit them to approach from a pragmatic viewpoint questions they may have to face some day, and issues such as the various facets of privacy, the fairness of automatic decisions, the transparency of algorithmic processes, their explainability. Some references (more may be found there) : * Cathy O’Neil, Weapons of Math Destruction.

* https://dataresponsibly.github.io/
* The MOOC of Jagadish : https://www.edx.org/course/data-science-ethics

Fondements de l'apprentissage par renforcement
Foundations of reinforcement learning
Nombre d'heures
24
ECTS
3 crédits
Université responsable de l'enseignement
ENS, Université PSL
Responsable(s)
Olivier Cappé

Note : la description de ce cours n'est pas disponible en Français.

This introductory course will provide the main methodological building blocks of reinforcement learning. Reinforcement Learning (RL) refers to situations where the learning algorithm operates in close-loop, simultaneously using past data to adjust its decisions and taking actions that will influence future observations. Algorithms based on RL concepts are now commonly used in programmatic marketing on the web, robotics or in computer game playing. All models for RL share a common concern that in order to attain one’s long-term optimality goals, it is necessary to reach a proper balance between exploration (discovery of yet uncertain behaviors) and exploitation (focusing on the actions that have produced the most relevant results so far). This introductory course will provide the main methodological building blocks of reinforcement learning. Some basic notions in probability theory are required to follow the course. The course will imply some work on simple implementations of the algorithms, assuming familiarity with common scientific computing language. Program 1. Multiarmed bandits, Markov Decision Processes and other models 2. Planning: finite and infinite horizon problems, the value function, Bellman equations, dynamic programming, value and policy iteration 3. Probabilistic and statistical tools for RL: Bayesian models, relative entropy and hypothesis testing, concentration inequalities, linear regression, the stochastic approximation algorithm 4. RL algorithms for multiarmed bandits: the explore vs. exploit compromise, bandit algorithms vs. A/B testing, UCB, Thomson sampling, contextual bandits 5. RL algorithms for Markov Decision Processes: off policy and on policy learning, Q-learning, SARSA, Monte Carlo tree search

Bibliographie, lectures recommandées

M. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic
Programming. John Wiley & Sons, 1994. R. Sutton and A. Barto. Introduction to Reinforcement Learning. MIT Press,
1998. C. Szepesvari. Algorithms for Reinforcement Learning. Morgan & Claypool
Publishers, 2010 J. Myles White. Bandit Algorithms for Website Optimization. O’Reilly. 2012 T. Lattimore and C. Szepesvari.
Bandit Algorithms. Cambridge University Press. 2019. http://downloads.tor-lattimore.com/banditbook/book.pdf

Fouille de graphes
Graph mining
Nombre d'heures
24
ECTS
3 crédits
Université responsable de l'enseignement
Université Paris-Dauphine, Université PSL
Responsable(s)
Daniela Grigori

Note : la description de ce cours n'est pas disponible en Français.

The objective of this course course is to give students an overview of the field of graph analytics. The objective of this course course is to give students an overview of the field of graph analytics . Since graphs form a complex and expressive data type, we need methods for representing graphs in databases, manipulating, querying, analyzing and mining them.Moreover, graph applications are very diverse and need specific algorithms.
The course presents new ways to model, store, retrieve, mine and analyze graph-structured data and some examples of applications.
Lab sessions are included allowing students to practice graph analytics: modeling a problem into a graph database and performing analytical tasks over the graph in a scalable manner.

Program

1. Introduction to graph management and mining
2. Graph databases – Neo4J
3. Query language for graphs – Cypher
4. Graph Processing Frameworks (Pregel, .., GraphX)
5. Graph applications : mining social-network graphs, mining logs, fraud detection, ..

Bibliographie, lectures recommandées

Ian Robinson, Jim Weber, Emil Eifrem, Graph Databases, Editeur : O’Reilly (4 juin 2013), ISBN-10: 1449356265
Eric Redmond, Jim R. Wilson, Seven Databases in Seven Weeks – A Guide to Modern Databases and the NoSQL Movement, Publisher: Pragmatic Bookshelf
Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing, SIGMOD ’10, ACM, New York, NY, USA, 135-146
Xin, Reynold & Crankshaw, Daniel & Dave, Ankur & Gonzalez, Joseph & J. Franklin, Michael & Stoica, Ion. (2014). GraphX: Unifying Data-Parallel and Graph-Parallel Analytics.
Michael S. Malak and Robin East, Spark GraphX in Action, Manning, June 2016

Apprentissage incrémental, théorie des jeux et applications
Incremental learning, game theory, and applications
Nombre d'heures
24
ECTS
3 crédits
Université responsable de l'enseignement
Université Paris-Dauphine, Université PSL
Responsable(s)
Rida Laraki

Note : la description de ce cours n'est pas disponible en Français.

This course will focus on the behavior of learning algorithms when several agents are competing against one another: specifically, what happens when an agent that follows an online learning algorithm interacts with another agent doing the same? The natural language to frame such questions is that of game theory, and the course will begin with a short introduction to the topic, such as normal form games (in particular zero-sum, potential, and stable games), solution concepts (such as dominated/rationalizable strategies, Nash, correlated and coarse equilibrium notions, ESS), and some extensions (Blackwell approachability). Subsequently, we will examine the long-term behavior of a wide variety of online learning algorithms (fictitious play, regret-matching, multiplicative/exponential weights, mirror descent and its variants, etc.), and we will discuss applications to generative adversarial networks (GANs), traffic routing, prediction, and online auctions. [1] Nicolò Cesa-Bianchi and Gábor Lugosi, Prediction, learning, and games, Cambridge University Press, 2006.

[2] Drew Fudenberg and David K. Levine, The theory of learning in games, Economic learning and social evolution, vol. 2, MIT Press, Cambridge, MA, 1998.
[3] Sergiu Hart and Andreu Mas-Colell, Simple adaptive strategies: from regret matching to uncoupled dynamics, World Scientific Series in Economic Theory – Volume 4, World Scientific Publishing, 2013.
[4] Vianney Perchet, Approachability, regret and calibration: implications and equivalences, Journal of Dynamics and Games 1 (2014), no. 2, 181–254.
[5] Shai Shalev-Shwartz, Online learning and online convex optimization, Foundations and Trends in Machine Learning 4 (2011), no. 2, 107–194.

Graphes de connaissance, logiques de description, raisonnement sur les données
Knowledge graphs, description logics, reasoning on data
Nombre d'heures
24
ECTS
3 crédits
Université responsable de l'enseignement
ENS, Université PSL
Responsable(s)
Camille Bourgaux Michaël Thomazzo

Note : la description de ce cours n'est pas disponible en Français.

Introduction to Knowledge Graphs, Description Logics and Reasoning on Data. Knowledge graphs are a flexible tool to represent knowledge about the real world. After presenting some of the existing knowledge graphs (such as DBPedia, Wikidata or Yago) , we focus on their interaction with semantics, which is formalized through the use of so-called ontologies. We then present some central logical formalism used to express ontologies, such as Description Logics and Existential Rules. A large part of the course will be devoted to study the associated reasoning tasks, with a particular focus on querying a knowledge graph through an ontology. Both theoretical aspects (such as the tradeoff between the expressivity of the ontology language versus the complexity of the reasoning tasks) and practical ones (efficient algorithms) will be considered.

Program:

1. Knowledge Graphs (history and uses)
2. Ontology Languages (Description Logics, Existential Rules)
3. Reasoning Tasks (Consistency, classification, Ontological Query Answering)
4. Ontological Query Answering (Forward and backward chaining, Decidability and complexity, Algorithms, Advanced Topics)
References:

— The description logic handbook: theory, implementation, and applications. Baader et al., Cambridge University Press
— Foundations of Semantic Web Technologies, Hitzler et al., Chapman&Hall/CRC
— Web Data Management, Abiteboul et al., Cambridge University Press Prerequisites:
— first-order logic;
— complexity (Turing machines, classical complexity classes) is a plus.

Apprentissage automatique sur Big Data
Machine learning on Big Data
Nombre d'heures
24
ECTS
3 crédits
Université responsable de l'enseignement
Université Paris-Dauphine, Université PSL
Responsable(s)
Dario Colazzo

Note : la description de ce cours n'est pas disponible en Français.

This course focuses on the typical, fundamental aspects that need to be dealt with in the design of machine learning algorithms that can be executed in a distributed fashion, typically on Hadoop clusters, in order to deal with big data sets, by taking into account scalability and robustness. Nowadays there is an ever increasing demand of machine learning algorithms that scales over massives data sets.
In this context, this course focuses on the typical, fundamental aspects that need to be dealt with in the design of machine learning algorithms that can be executed in a distributed fashion, typically on Hadoop clusters, in order to deal with big data sets, by taking into account scalability and robustness. So the course will first focus on a bunch of main-stream, sequential machine learning algorithms, by taking then into account the following crucial and complex aspects. The first one is the re-design of algorithms by relying on programming paradigms for distribution and parallelism based on map-reduce (e.g., Spark, Flink, ….). The second aspect is experimental analysis of the map-reduce based implementation of designed algorithms in order to test their scalability and precision. The third aspect concerns the study and application of optimisation techniques in order to overcome lack of scalability and to improve execution time of designed algorithm.

The attention will be on machine learning technique for dimension reduction, clustering and classification, whose underlying implementation techniques are transversal and find application in a wide range of several other machine learning algorithms. For some of the studied algorithms, the course will present techniques for a from-scratch map-reduce implementation, while for other algorithms packages like Spark ML will be used and end-to-end pipelines will be designed. In both cases algorithms will be analysed and optimised on real life data sets, by relaying on a local Hadoop cluster, as well as on a cluster on the Amazon WS cloud.

References:

– Mining of Massive Datasets
http://www.mmds.org

– High Performance Spark – Best Practices for Scaling and Optimizing Apache Spark
Holden Karau, Rachel Warren
O’Reilly

Recherche Monte-Carlo et jeux
Monte-Carlo search and games
Nombre d'heures
24
ECTS
3 crédits
URL du cours
https://www.lamsade.dauphine.fr/~cazenave/MonteCarloSearch.html
Université responsable de l'enseignement
Université Paris-Dauphine, Université PSL
Responsable(s)
Tristan Cazenave

La recherche Monte-Carlo a révolutionné la programmation des jeux. Elle se combine bien avec le Deep Learning pour créer des systèmes qui jouent mieux que les meilleurs joueurs humains à des jeux comme le Go, les Echecs, le Hex ou le Shogi. Elle permet aussi d’approcher des problèmes d’optimisation difficiles. Dans ce cours nous traiterons des différents algorithmes de recherche Monte-Carlo comme UCT, GRAVE ou le Monte-Carlo imbriqué et l’apprentissage de politique de playouts. Nous verrons aussi comment combiner recherche Monte-Carlo et apprentissage profond. Le cours sera validé par un projet portant sur un jeu ou un problème d’optimisation difficile.

Bibliographie, lectures recommandées

Intelligence Artificielle Une Approche Ludique, Tristan Cazenave, Editions Ellipses, 2011.

Traitement automatique des langues
Natural language processing
Nombre d'heures
24
ECTS
3 crédits
Université responsable de l'enseignement
Université Paris-Dauphine, Université PSL
Responsable(s)
Alexandre Allauzen

Note : la description de ce cours n'est pas disponible en Français.

The course focuses on modern and statistical approaches to NLP. Natural language processing (NLP) is today present in some many applications because people communicate most everything in language : post on social media, web search, advertisement, emails and SMS, customer service exchange, language translation, etc. While NLP heavily relies on machine learning approaches and the use of large corpora, the peculiarities and diversity of language data imply dedicated models to efficiently process linguistic information and the underlying computational properties of natural languages. Moreover, NLP is a fast evolving domain, in which cutting-edge research can nowadays be introduced in large scale applications in a couple of years. The course focuses on modern and statistical approaches to NLP: using large corpora, statistical models for acquisition, disambiguation, parsing, understanding and translation. An important part will be dedicated to deep-learning models for NLP. – Introduction to NLP, the main tasks, issues and peculiarities

– Sequence tagging: models and applications
– Computational Semantics
– Syntax and Parsing
– Deep Learning for NLP: introduction and basics
– Deep Learning for NLP: advanced architectures
– Deep Learning for NLP: Machine translation, a case study

Bibliographie, lectures recommandées :
– Costa-jussà, M. R., Allauzen, A., Barrault, L., Cho, K., & Schwenk, H. (2017). Introduction to the special issue on deep learning approaches for machine translation. Computer Speech & Language, 46, 367-373.
– Dan Jurafsky and James H. Martin. Speech and Language Processing (3rd ed. draft): https://web.stanford.edu/~jurafsky/slp3/
– Yoav Goldberg. A Primer on Neural Network Models for Natural
Language Processing: http://u.cs.biu.ac.il/~yogo/nnlp.pdf
– Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning: http://www.deeplearningbook.org/

Nuages de points et modélisation 3D
Point clouds and 3D modeling
Nombre d'heures
24
ECTS
3 crédits
URL du cours
http://caor-mines-paristech.fr/fr/cours-npm3d/
Université responsable de l'enseignement
Mines ParisTech, Université PSL
Responsable(s)
François Goulette

Ce cours donne un panorama des concepts et techniques d’acquisition, de traitement et de visualisation des nuages de points 3D, et de leurs fondements mathématiques et algorithmiques. Le cours abord notamment les thème suivants :

Systèmes de perception 3D
Traitements et opérateurs
Recalage
Segmentation de nuages de points
Reconstruction de courbes et surfaces
Modélisation par primitives
Rendu de nuages de points et maillages
Certaines séances sont complétées d’un TP.

Stage

Stage de recherche IASD
IASD research internship
Nombre d'heures
ECTS
12 crédits
Université responsable de l'enseignement
Université Paris-Dauphine, Université PSL