DataScience Lab
Table of Contents
News / info #
- As the first lecture was cancelled, the first lecture is on October 1st.
- Recall: you have to change teammate for each assignment. No exceptions.
(Tentative) planning for the year #
Note: A1 = assignment 1, Ax = assignment x.
Date | Description |
---|---|
September, 21 | — NO CLASS — |
October, 1 | Class intro + Intro A1 |
October, 7 | - NO CLASS - |
October, 15 | Preliminary presentations A1 |
October, 21 | Deadline A1 23h59 |
October, 22 | Final presentations A1. Intro A2 |
October, 29 | Alexandre's presentation on PR + group session |
November, 05 | Preliminary presentations A2 |
November, 12 | - NO CLASS - |
November, 17 | Deadline A2 23h59 |
November, 18 | Final presentations A2 + Intro A3 |
November, 20 | Lucas' presentation + group session |
November, 27 | — NO CLASS — |
December, 2 | Deadline A3 23h59 |
December, 3 | Preliminary + final presentation A3 |
Assignment 1 #
Links
- Slides assignment 1: here
- GitHub classroom link: here. If you can't find your name, come to me.
- Testing datasets are available here.
- Testing platform: here.
Refs
- Recommender Systems Survey Latent Vector (link)
- Recommender Systems: The Textbook by Charu C. Aggarwal (read the section about MF)
- Generalized Principal Component Analysis — René Vidal, Yi Ma, S. Shankar Sastry (link)
- Deep Matrix Factorization — Xue et al.(link)
- Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering(link)
- Learning to Match via Inverse Optimal Transport(link)
- Neural Graph Collaborative Filtering(link)
HowTo #
Group sessions
How it is supposed to work:
- Students describe their plan/idea/readings/experiments and ask questions;
- Professors answer questions when they can.
How it is not supposed to work:
- Professors explain students how to conduct their project.
Class presentations
- For each assignment, each group is expected to give exactly one presentation (either a preliminary presentation or a final presentation).
- The presentations must be uploaded on the Git repository at the start of the class (no email).
- The presentations must be in PDF format and named
slides.pdf
. - Order of presentations will be randomly determined at the start of the class.
Preliminary presentations
- 6 minutes (~ 6-8 slides)
- Briefly & clearly state the problem you are working on
- Present and compare approaches you are considering
- Describe what you have implemented (briefly)
- Discuss possible experiments and evaluation metrics
- Present preliminary results if you have any
Final presentations
- 6 minutes (~ 6-8 slides)
- State the problem you studied
- Compare approaches
- Describe what you implemented
- Discuss metrics
- Show and discuss experimental results
Reports
- 1 front page with student names, team name, and optional project title
- 5 extra pages max (refs not included, figures included)
- PDF file named
report.pdf
on the Git repository by the deadline (no email) - Include: implemented items + file paths, experiments with conclusions, lessons learned
- Exclude: long theory descriptions; extensive code listings (brief pseudocode is ok)
FAQ #
Can I develop approach X (a method not discussed in class)?
You are encouraged to study & implement something not discussed in class, as long as it addresses the target problem. Comparing a known approach with a novel one is typically valuable.
Is it mandatory to use the dataset or metric specified by the professors?
Prefer running at least one comparable experiment, but feel free to explore other datasets/metrics to better understand your method’s behavior.
Do I have to work with virtual env ?
It is not mandatory for your work, but as it is a good practice, we use it to run the testing platform.
Therefore, you should at least provide a requirements.txt
file with the list of required packages.
I don't have enough computing power.
Consider cloud notebooks (e.g., Colab) or to come soon: Mesonet to access more ressources.