Seminaire Jeux

Jeudi 25 Octobre 2018, 15h00 en salle D102 a Dauphine

Emerick Lecomte

Eugen Systems

Techniques et applications de l'IA dans les jeux video

Le jeu video occupe une grande part du marche du divertissement. L'intelligence artificielle se retrouve dans tous les aspects du jeu, narration, animation, comportement de jeu, prise de decision, automatisation d'action du joueur, analyse de donnees...

Voici un petit panorama de l'IA pour le jeu video.

Quelles sont les problematiques de l'IA dans le ce domaine ?

Quelles sont les techniques utilisees de maniere generale ?

Quelles sont les techniques pour un jeu de strategie en temps reel tel que Steel Division 2 ?

Quelles sont les debouches dans le futur et dans la recherche ? (Ubisoft La Forge, SEED, NERD...)

Jeudi 15 Mars 2018, 14h00 en salle P301 a Dauphine

Remi Coulom et Jordan Bouchoucha

Experiences d'apprentissage pour le jeu de mahjong

Le jeu de mahjong est un jeu a 4 joueurs tres populaire en Asie. C'est un jeu extremement complexe pour lequel il n'existe pas encore de programme jouant au niveau des meilleurs humains. Cet expose presentera les experiences que nous avons realisees, ainsi que les voies de recherche que nous envisageons de poursuivre. L'apprentissage supervise a partir de parties jouees par des humains experts nous a permis d'atteindre un niveau d'amateur (3-4 dan sur le serveur tenhou.net). L'apprentissage par renforcement dans une approche de type "policy gradient" fonctionne mal: tres lent, avec des problemes d'optima locaux. Nous envisageons une approche actor-critic de type "Neural Fictitious Self-Play", qui a deja donne de bons resultats pour le poker.

Si vous souhaitez jouer contre notre IA (resultat de l'apprentissage supervise), vous pouvez essayer notre appli Android:

mahjong.

Jeudi 9 Novembre 2017, 15h00 en salle A407 a Dauphine

Mark Winands

Maastricht University

Monte Carlo Tree Search in Real-Time Games

Monte Carlo Tree Search (MCTS) is a best-first search technique that gradually builds up a search tree, guided by Monte-Carlo simulations. It has become a widely used technique for many board games. It has been used in many optimization problems as well. A particular challenge for MCTS agents are real-time games. These games are characterized by uncertainty, a large state space, and open-endedness. However, MCTS copes well when limited time is available between moves as it can be terminated anytime to select a move. In this talk I will focus on our applications of MCTS for Ms Pacman, StarCraft, and the General Video Game AI framework.

Jeudi 19 octobre 2017, 14h00 en salle A711 a Dauphine

Eric Piette

LAMSADE, Universite Paris Dauphine

Une nouvelle approche au General Game Playing dirigee par les contraintes

Developper un programme capable de jouer a n'importe quel jeu de strategie, souvent designe par le General Game Playing (GGP) constitue un des Graal de l'intelligence artificielle. Les competitions GGP, ou chaque jeu est represente par un ensemble de regles logiques au travers du Game Description Language (GDL), ont conduit la recherche a confronter de nombreuses approches incluant les methodes de type Monte Carlo, la construction automatique de fonctions evaluation, ou la programmation logique et ASP. De par ces travaux, nous proposons une nouvelle approche dirigee par les contraintes stochastiques.

Dans un premier temps, nous nous concentrons sur l'elaboration d'une traduction de GDL en reseaux de contraintes stochastiques (SCSP) dans le but de fournir une representation dense des jeux de strategies et permettre la modelisation de strategies.

Par la suite, nous exploitons un fragment de SCSP au travers d'un algorithme denomme MACUCB combinant l'algorithme MAC (Maintaining Arc Consistency) utilise pour resoudre chaque niveau du SCSP tour apres tour, et a l'aide de UCB (Upper Confidence Bound) afin d'estimer l'utilite de chaque strategie obtenue par le dernier niveau de chaque sequence. L'efficacite de cette nouvelle technique sur les autres approches GGP est confirmee par WoodStock, implementant MACUCB, le leader actuel du tournoi continu de GGP.

Finalement, dans une derniere partie, nous proposons une approche alternative a la detection de symetries dans les jeux stochastiques, inspiree de la programmation par contraintes. Nous montrons experimentalement que cette approche couplee a MACUCB, surpasse les meilleures approches du domaine et a permis a WoodStock de devenir champion GGP 2016.

Jeudi 28 septembre 2017, 16h00 en salle A a Dauphine

Veronique Ventos

TAO, LRI, Universite Paris Sud

Le Bridge, nouveau defi de l'intelligence artificielle ?

Le jeu de bridge est un jeu extremement complexe et ce aussi bien pour les humains que pour les programmes de bridge. Contrairement au jeu de Go ou l'IA AlphaGo a battu le champion du monde Ke Jie, les IA de bridge sont encore loin des meilleurs joueurs humains. La difference essentielle entre les deux jeux porte sur l'aspect information incomplete omnipresent dans le bridge. Un programme expert de ce jeu pourrait donc realiser des taches differentes et complementaires de celles traitees par AlphaGo.

La premiere partie de cet expose sera consacree a la presentation des differents aspects du bridge et de pistes de reflexion sur les differents challenges qui lui sont inherents.

Dans la seconde partie, nous presentons nos travaux concernant l'adaptation au bridge d'une methode recente permettant d'optimiser des IA de jeux en recherchant une graine aleatoire meilleure que les autres sur le jeu concerne. L'IA Wbridge5 developpee par Yves Costel boostee avec cette methode a remporte les championnats du monde de robots de bridge en septembre 2016 et en aout 2017.

Jeudi 30 mars 2017, 15h00 en salle A707 a Dauphine

Stephane Nicolet

Universite Paris 2

Stockfish

Stockfish est un programme d'echecs open-source, actuellement considere comme la reference en matiere de niveau de jeu. Nous essayerons lors de cette presentation de revenir sur quelques points qui le distinguent des autres logiciels d'echecs :

- ferme de calculs distribuee pour les tests collaboratifs

- methode statistique de validation des ameliorations

- l'algorithme de parallelisation de l'alpha-beta sans synchronisation

- ameliorations recentes de la fonction d'evaluation et de l'elagage.

Jeudi 15 decembre 2016, 12h15 en salle A a Dauphine

Miquel Oliu Barton

Ceremade, Universite Paris-Dauphine, PSL

Apprentissage Renforce en Theorie des Jeux

Jeudi 20 octobre 2016, 16h00 en salle A707 a Dauphine

Abdallah Saffidine

University of New South Wales

The Parameterized Complexity of Positional Games

Abstract:

We prove that Short Generalized Hex is W[1]-complete parameterized by the number of moves. This solves an open problem from Downey and Fellows' influential list of open problems from 1999. Previously, the problem was thought of as a natural candidate for AW[*]-completeness. Our main tool is a new fragment of first-order logic where universally quantified variables do not occur in inequalities. We show that model-checking on arbitrary relational structures for a formula in this fragment is W[1]-complete when parameterized by formula size.

We then consider Positional Games, a general framework where the game is represented as a hypergraph, two players alternately pick vertices, and the winner is determined based on a player picking all the vertices of some hyperedge. Positional Games are traditionally labelled as Maker-Breaker, Maker-Maker, or Avoider-Enforcer based on the exact details of the winning condition. Our results show that these classes are respectively complete for W[1], AW[*], and co-W[1], thereby providing a rough picture of the complexity of Positional Games parameterized by the number of moves. However, some games where the board and the winning configurations are highly structured are fixed-parameter tracable (FPT). In particular, Short k-Connect, which generalizes Tictactoe and Gomoku, as well as Short Hex are FPT.

This talk is based on joint work with Edouard Bonnet and Florian Jamain appearing in the journal "Theoretical Computer Science" as well as ongoing work with Serge Gaspers, Antonin Lambilliotte, and Stefan Rummele.

Biography:

Abdallah Saffidine is an ARC DECRA Fellow and a Research Associate at the University of New South Wales, Sydney, Australia. He works in the Artificial Intelligence and Algorithms groups. He arrived at UNSW in 2013 as a postdoc with Pr. Michael Thielscher and obtained his PhD from the Universite Paris-Dauphine, France, under the supervision of Pr. Tristan Cazenave. Abdallah has a wide range of interests from games, planning, and other areas of decision-making to logic, complexity and other areas of theory.

Lundi 3 octobre 2016, 15h00 en salle A711 a Dauphine

Alan Blair

University of New South Wales

Adventures and Follies in Game Tree Learning

Abstract: It was demonstrated in 2009 that the TreeStrap algorithm could learn Chess to Master level from random initial weights. However, the learning rate had to be carefully tuned and learning sometimes became unstable. This talk will give a historical overview of game tree learning, and then discuss modifications to the TreeStrap algorithm including staged learning, incremental evaluation and scaling of learning rate by depth, which make the algorithm more stable and allow it to be scaled up to a multi-player Chess variant board game called Duchess.

Bio: Alan Blair completed a PhD in Mathematics at MIT and worked at Brandeis, Queensland and Melbourne before taking up his current position at UNSW. In addition to TreeStrap, he is known for his pioneering work in the comparison of evolution with gradient descent for Backgammon, Tron and Simulated Hockey, as well as his early attempts to train parallel convolutional neural networks for the game of Go. His non-game interests include robot navigation, language processing, hierarchical evolutionary computation and evolutionary deep learning.

Jeudi 30 juin 2016, 15h00 en salle A307 a Dauphine

Tristan Cazenave

PSL - Universite Paris-Dauphine - LAMSADE

Playout Policy Adaptation with Move Features

Monte Carlo Tree Search (MCTS) is the state of the art algorithm for General Game Playing (GGP). We propose to learn a playout policy online so as to improve MCTS for GGP. We also propose to learn a policy not only using the moves but also according to the features of the moves. We test the resulting algorithms named Playout Policy Adaptation (PPA) and Playout Policy Adaptation with move Features (PPAF) on Atarigo, Breakthrough, Misere Breakthrough, Domineering, Misere Domineering, Knightthrough, Misere Knightthrough and Nogo. The experiments compare PPA and PPAF to Upper Confidence for Trees (UCT) and to the closely related Move-Average Sampling Technique (MAST) algorithm.

Nested Rollout Policy Adaptation with Selective Policies

Monte Carlo Tree Search (MCTS) is a general search algorithm that has improved the state of the art for multiple games and optimization problems. A record breaking algorithm for puzzles and optimization is Nested Rollout Policy Adaptation (NRPA). It learns a playout policy online that dynamically adapts the playouts to the problem at hand. We propose to enhance NRPA using more selectivity in the playouts. The idea is applied to three different problems: Bus regulation, SameGame and Weak Schur numbers. We improve on standard NRPA for all three problems.

Combining tactical search and deep learning in the game of Go

In this paper we experiment with a Deep Convolutional Neural Network for the game of Go. We show that even if it leads to strong play, it has weaknesses at tactical search. We propose to combine tactical search with Deep Learning to improve Golois, the resulting Go program. A related work is AlphaGo, it combines tactical search with Deep Learning giving as input to the network the results of ladders. We propose to extend this further to other kind of tactical search such as life and death search.

Mardi 29 mars 2016, 14h00 en salle B212 a Dauphine

Monte-Carlo Tree Search in Logistics

Stefan Edelkamp

University of Bremen

In this talk I review recent advances of randomized AI search in solving industrially relevant optimization problems. The method I focus on is a sampling-based solution mechanism called Monte-Carlo Tree Search (MCTS) and establish a better trade-off between exploitation and exploration.

This method, originating in game playing research, is a general heuristic search technique, for which often less problem-specific knowledge has to be added than in comparable approaches.

The production and logistics problems I consider include vehicle routing, container packing and robot motion planning tasks, which are usually considered to be connected to Operations Research, often referring to decades of research. Despite the long history of the problems addressed, NRPA, a recent variant of MCTS with nestedness and policy adaptation, computes promising results, sometimes even of a quality that is difficult to match with other global/local optimization strategies.

The breadth of the three different application domains illustrates that a strong search method is entering the OR arena, which might eventually lead to a paradigm shift.

Lundi 7 mars 2016, 16h00 en Salle A (deuxieme etage) a Dauphine

Racing-Based Genetic Programming

J.-B. Hoock and O. Teytaud

TAO (Inria), LRI, UMR 8623(CNRS - Univ. Paris-Sud), bat 490 Univ. Paris-Sud 91405 Orsay, France, hoock@lri.fr

Genetic Programming (GP) is the automatic building of programs for solving a given task. The main troubles are (i) the price of the evaluation of a mutation (ii) the size of the huge set of possible mutations. In many cases, one more trouble comes from the noisy evaluation; the fitness function, evaluating the quality of a program, is stochastic. When evaluating a program, one spends a time (supposed constant) for running it, and gets a noisy reward: we are looking for

argmin E fitness(x). x in possible programs

When mining a huge set of mutations in an uncertain framework (noisy optimization), there are two main issues:

- load balancing: which mutations of the program are to be tested now ?

-statistical validation: which mutations of the program should be validated ?

The statistical validation is not trivial; whenever each mutation is tested rigorously, e.g. with confidence 95 % (i.e. 5% of probability of error), we will have erroneous validations very frequently if millions of mutations are tested (with just 100 mutations, each of them being useless, we have a probability 99.4% (= 1 - (1 - 0.95)100 ) of erroneous validation). This is known as the Multiple Simultaneous Hypothesis Testing (MSHT) effect.

Bandits have been used for a while, even in GP, for the load balancing problem, but not yet, to the best of our knowledge, for both issues simultaneously. We here investigate the use of racing algorithms, a particular form of bandit algorithms covering simultaneously the load balancing and the statistical validation. Our algorithm, termed RBGP (racing-based GP) depending on parameters delta and c, such that the following two properties hold simultaneously with probability 1 - delta: Monotonic improvement:

E fitness(xn ) > E fitness(xn+1 );

and (efficiency) if there exists a possible mutation of x with improvement at least c, then xn+1 contains a mutation with positive effect that was not in xn . We applied our results to the GP of the bias module of MCTS, a well-known problem which is difficult even for humans; whereas humans often do mistakes due to the multiplication of trials (the MSHT effect above), RBGP provided improvements comparable to humans developers, and more importantly was never convinced of sending a mistake in the CVS archive of the project.

Lundi 25 janvier 2016, 15h30 en A403 a Dauphine

Approche generale pour la decomposition de descriptions de jeux en GDL

Aline Hufschmitt

LIASD, Universite Paris 8

Nous presentons l'etat de nos travaux sur la decomposition des descriptions de jeux en Game Description Language (GDL). Dans le domaine du General Game Playing, l'exploration des jeux decrits en GDL peut etre simplifiee significativement par la decomposition du probleme en sous-problemes analysables separement. Nous avons elabore une technique qui permet de decomposer des descriptions de jeux avec importe quel nombre de joueurs en traitant le probleme des actions simultanees. Cette approche permet identifier les sous-jeux parfaitement separables, mais egalement de decomposer les jeux constitues de deux sous-jeux en serie ou les jeux a actions composees, en evitant, contrairement aux methodes precedentes, de s'appuyer sur des elements syntaxiques qui peuvent etre elimines par une simple reecriture des regles GDL. Nous presentons les resultats obtenus sur un panel de 40 jeux. Nous identifions des questions nouvelles et difficiles posees par le probleme de la decomposition et esquissons des pistes pour y repondre.

Lundi 30 novembre 2015, 15h00 en A707 a Dauphine

Graines aleatoires et MCTS

Jialin Liu

TAO, INRIA Saclay-CNRS-LRI, Universite Paris-Sud

Nous montrons l'impact du choix de la graine aleatoire initiale dans les performances d'un MCTS. L'approche est testee sur Domineering, Go, Breakthrough, Atari-Go. Nous montrons ensuite comment l'approche peut etre etendue a des graines specifiques d'une position donnee.

Jeudi 1er octobre 2015, 15h15 en A707 a Dauphine

Invertibility modulo dead-ending no-P-universes

Gabriel Renault

In normal version of combinatorial game theory, all games are invertible, whereas only the empty game is invertible in misere version. For this reason, several restricted universes were earlier considered for their study, in which more games are invertible. We here study combinatorial games in misere version, in particular universes where no player would like to pass their turn. In these universes, we prove that having one extra condition makes all games become invertible. We then focus our attention on a specic quotient and show that the set of universes sharing that quotient is closed under sum.

Jeudi 11 juin 2015, 15h00 en A304 a Dauphine

Que peut-on dire de la planification qui est utilisee dans les jeux-videos du commerce

Eric Jacopin

Centre de recherche des ecoles de Saint-Cyr Coetquidan

Cette presentation proposera une serie de criteres ayant pour objectif de pouvoir comparer la planification utilisee comme composant de decision d'action pour les personnages non joueurs des jeux-videos. Ces criteres ont ete concus a partir de donnees industrielles et defendent l'idee d'un "Behavioral Analytics" tout comme il existe un "Game Analytics". Apres une rapide presentation historique, de la collecte et de l'analyse des donnees recueillies, des resultats comparatifs de performances seront presentes en tentant de faire le parallele avec l'analyse du comportement des joueurs et en decrivant ce que l'on peut esperer gagner a systematiser ces criteres.

Jeudi 28 mai 2015, 15h00 en A304 a Dauphine

Generalized Rapid Action Value Estimation and Playout Policy Adaptation for Games

Tristan Cazenave

LAMSADE, Universite Paris-Dauphine

Generalized Rapid Action Value Estimation

Monte Carlo Tree Search (MCTS) is the state of the art algorithm for many games including the game of Go and General Game Playing (GGP). The standard algorithm for MCTS is Upper Confidence bounds applied to Trees (UCT). For games such as Go a big improvement over UCT is the Rapid Action Value Estimation (RAVE) heuristic. We propose to generalize the RAVE heuristic so as to have more accurate estimates near the leaves. We test the resulting algorithm named GRAVE for Atarigo, Knighthrough, Domineering and Go.

Playout Policy Adaptation for Games

Monte Carlo Tree Search (MCTS) is the state of the art algorithm for General Game Playing (GGP). We propose to learn a playout policy online so as to improve MCTS for GGP. We test the resulting algorithm named Playout Policy Adaptation (PPA) on Atarigo, Breakthrough, Misere Breakthrough, Domineering, Misere Domineering, Go, Knightthrough, Misere Knightthrough, Nogo and Misere Nogo. For most of these games, PPA is better than UCT with a uniform random playout policy, with the notable exceptions of Go and Nogo.

Jeudi 19 mars 2015, 15h00 en A711 a Dauphine

Planification hierarchique temps reel : decomposition et abstraction

Alexandre Menif

LAMSADE, Universite Paris-Dauphine

Le contexte de ces travaux est l'utilisation de la planification HTN (Hierarchical Task Network) afin de generer en temps-reel des plans d'actions adaptes aux manoeuvres de combat de petite unites d'infanterie (environ une trentaine d'agents) dans le cadre de la simulation (type jeux video). Les agents etant organises selon plusieurs niveaux d'abstraction (correspondant a la chaine de commandement militaire), la question qui se pose est : est-il possible d'ameliorer les performances du planificateur en s'appuyant sur cette hierarchie ? Apres avoir introduit la planification HTN, une semantique fondee sur l'abstraction de l'espace des etats est proposee afin de definir des operateurs de haut niveau abstraits. Enfin, un algorithme exploitant ces operateurs abstraits est presente au cote des premiers resultats obtenus pour deux domaines de planification.

Mardi 3 fevrier 2015, 15h30 en P516 a Dauphine

An Abstract Procedure to Compute Weak Schur Number Lower Bounds

Bruno Bouzy

LIPADE, Universite Paris-Descartes

In this talk, we present a new method to compute explicit solutions to the weak Schur problem, and consequently obtain lower bounds to weak Schur numbers WS(K) for K <= 8. Our method is recursive, based on the principle that good solutions to the K+1 column problem are extensions of good solutions to the K column problem. We make several observations on the regularities of good solutions: (1) the last column is filled with nearly consecutive numbers. (2) The beginning of a column may contain holes, but holes in the middle of the sequence hinder the solution. We test an abstract recursive function with refinements based on these observations. We define and implement an abstract simulation usable by Monte-Carlo search. Using such tools, we found the existing lower bounds for K <= 6, and we discovered new bounds: WS(7) >= 1736 and WS(8) >= 5105.

Jeudi 18 decembre 2014, 15h00 en D102 a Dauphine

The Role of Knowledge Extraction in Computer Go

Simon Viennot

School of Information Science JAIST (Japan Advanced Institute of Science and Technology)

The level of programs for the game of Go has been continuously increasing these last 10 years, reaching now a strong amateur level. First, we present a review of current Go programs and competitions. Then, we describe the methods used in top Go programs, with an emphasis on the role of knowledge extraction and how it is combined with Monte-Carlo Tree Search. Knowledge is considered here as all the information that can be acquired about the game, either offline with machine learning on professional game records, or online, from the Monte-Carlo simulations. With examples from our Go program Nomitan, we show how knowledge extraction is a key factor in the strength of programs. Finally, we detail some of the problems that still need to be solved in order to reach professional strength.

Joint work with Kokolo Ikeda (JAIST, Associate Professor)

Lundi 24 novembre 2014, 16h15 en A707 a Dauphine

Methode hybride de recuit simule appliquee au probleme du voyageur de commerce multiobjectif

Marek Cornu

Universite Paris-Dauphine

Le Probleme du Voyageur de Commerce (TSP) consiste a trouver, parmi n villes, un chemin de longueur totale minimale qui passe exactement une fois par chaque ville et revienne au point de depart.Il s'agit d'un probleme NP-difficile, tres etudie en optimisation combinatoire. Ici, nous nous interessons a la version multiobjectif du probleme, dans laquelle on prend en compte plusieurs criteres potentiellement conflictuels. Dans ce contexte il est usage de determiner ensemble des points non-domines (ou front de Pareto). Des lors que la taille de instance et/ou le nombre de criteres augmentent, il s'avere impossible de recourir a des methodes exactes. Nous proposons une metaheuristique combinant 3 outils d'optimisation dans le but d'approcher le Front de Pareto de differentes instances de TSP multiobjectif. Le premier outil est l'algorithme de Lin-Kernighan, qui va nous permettre de resoudre efficacement des instances mono-objectif du TSP. Le deuxieme est la regle de Metropolis, utilisee dans le recuit simule mono-objectif standard afin d'orienter efficacement la recherche. Et enfin l'epsilon dominance de Pareto, pour eviter une explosion du nombre de solutions efficaces memorisees durant la recherche.

Vendredi 17 octobre 2014, 15h00 en D102 a Dauphine

LeJoueur at the 2014 International General Game Playing Competition

Jean-Noel Vittaut

Universite Paris 8

In this presentation, we will give a report on the 2014 General Game Playing Competition surveying the competitors and the games and we will describe LeJoueur in more detail. General game players are computer programs capable of playing a large variety of games given only a description of the rules. Every year, a competition is organized to compare the relative strength of general game players from all over the world. This year, the competition focused on decomposable games. LeJoueur, a general game player developed at Universite Paris 8 has reached the finals of this year's competition. It relies on Propositional Nets to rapidly interpret the game rules and uses a parallelization of UCT as a decision method. Propositional Nets, which were used by only one player in previous competitions, has been broadly adopted this year, producing a general improvement of player strength.

Lundi 7 juillet 2014, 15h30 en A305 a Dauphine

A Systematic Solution to the (De-)Composition Problem in General Game Playing

Abdallah Saffidine

University of New South Wales

General game players can drastically reduce the cost of search if they are able to solve smaller subproblems individually and synthesise the resulting solutions. To provide a systematic solution to this (de-)composition problem, we start off with generalising the standard decomposition problem in planning by allowing the composition of individual solutions to be further constrained by domain-dependent requirements of the global planning problem. We solve this generalised problem based on a systematic analysis of composition operators for transition systems, and we demonstrate how this solution can be further generalised to general game playing.

joint work with Timothy Cerexhe, David Rajaratnam, and Michael Thielscher

Lundi 23 juin 2014, 15h30 en P303 a Dauphine

Strategies locales pour Monte Carlo Tree Search

Andre Fabbri

Universite Lyon 1

Monte Carlo Tree Search (MCTS) fait a present partie des references de l'IA pour les jeux. Depuis son introduction pour le jeu de Go, MCTS a ete applique avec succes a d'autres jeux (Hex, GGP) et a ouvert la voie a toute une famille de nouvelles methodes (Multiple-MCTS, Nested-MCTS) ... A instar des methodes dites de Monte Carlo, MCTS evalue la victoire globale d'une situation de jeu a partir de milliers de fins de parties generees aleatoirement. A mesure que les simulations progressent, le programme se construit une representation de la situation courante et oriente sa recherche vers les coups les plus prometteurs. Cependant, face a des situations critiques (ou chaudes), le programme peine a correctement identifier les coups importants et disperse alors sa recherche. Au cours de cette presentation, je montrerai comment des evaluations locales permettent d'abstraire l'exploration de l'espace de recherche et offrent la possibilite, a terme, d'ameliorer le processus d'apprentissage en recentrant la recherche autour d'objectifs intermediaires.

Lundi 26 mai 2014, 15h00 en A711 a Dauphine

Monte Carlo Tree Search with Heuristic Evaluations using Implicit Minimax Backups

Mark Winands

Maastricht University

Monte Carlo Tree Search (MCTS) has improved the performance of game-playing engines in domains such as Go, Hex, and general-game playing. MCTS has been shown to outperform outperform classic alpha-beta search in games where good heuristic evaluations are difficult to obtain. In recent years, combining ideas from traditional minimax search in MCTS has been shown to be advantageous in some domains, such as Lines of Action, Amazons, and Breakthrough. In this paper, we propose a new way to use heuristic evaluations to guide the MCTS search by storing the two sources of information, estimated win rates and heuristic evaluations, separately. Rather than using the heuristic evaluations to replace the playouts, our technique backs them up implicitly during its MCTS simulations. These learned evaluation values are then used to guide future simulations. Compared to current techniques, we show that using implicit minimax backups leads to stronger play performance in Breakthrough, Lines of Action, and Kalah.

Mardi 6 mai 2014, 15h30 en P303 a Dauphine

Mechanisms for Arranging Ride Sharing and Fare Splitting for Last-Mile Travel Demands

Shih-Fen CHENG

Singapore Management University

With rapid growth of transportation demands in urban cities, one common challenge of city planners is to provide efficient and effective connection service to travelers using public transportation system (from public transport hubs to final destinations). This is commonly known as the last-mile problem, and is critical in promoting the utilization of public transportation system. In this paper, we address the last-mile problem by considering a dynamic and demand responsive mechanism for arranging ride sharing on a non-dedicated commercial fleet (such as taxis or passenger vans). Our approach has the benefits of being dynamic, flexible, and with low setup cost. A critical issue in such ride-sharing service is how riders should be grouped and serviced, and how fares should be split. To facilitate our designed approach, we propose two auction designs which are used to solicit individual rider's willing payment rate and compensation rate (for extra travel, if any). We demonstrate that these two auctions are budget balanced, individually rational, and incentive compatible. A series of experimental studies based on both synthetic and real-world datasets are designed to demonstrate the pros and cons of our two proposed auction mechanisms in various settings.

Mardi 4 mars 2014, 15h30 en A301 a Dauphine

Combining Simulated Annealing and MCTS for Expression Simplification

Ben Ruijl

Nikhef and Tilburg University

MCTS has proven to be a good candidate for finding Horner schemes (a list that defines in what order variables should be pulled outside brackets). These schemes are useful to simplify expressions, which in turn speeds up numerical integration significantly. For the effectiveness of a Horner scheme, it is important that the entire scheme is optimized and not just the beginning. However, the well-known UCT child selection has the disadvantage that it does little to optimize deep in the tree (and thus little at the end of the scheme), and that the final tree updates are wasted on exploration. Inspired by simulated annealing, we introduce a dynamic exploration-exploitation parameter T that decreases with the iteration number. Thus the first iterations are used for exploration, whereas the focus gradually shifts to exploitation. The role of T is similar to that of the temperature in simulated annealing. Using this modified UCT, which we call SA-UCT, we obtain better results for our domain.

Lundi 10 fevrier 2014, 15h30 en A711 a Dauphine

Les orateurs sont les suivants :

Monte-Carlo Fork Search for Cooperative Path-Finding

Bruno Bouzy

LIPADE, Universite Paris Descartes, FRANCE

This paper presents Monte-Carlo Fork Search (MCFS), a new algorithm that solves Cooperative Path-Finding (CPF) problems with simultaneity. The background is Monte-Carlo Tree Search (MCTS) and Nested Monte-Carlo Search (NMCS). Concerning CPF, MCFS avoids to enter into the curse of the very high branching factor. Regarding MCTS, the key idea of MCFS is to build a tree balanced over the whole game tree. To do so, after a simulation, MCFS stores the whole sequence of actions in the tree, which enables MCFS to fork new sequences at any depth in the built tree. This idea fits CPF problems in which the branching factor is too large for MCTS or A* approaches, and in which congestion may arise at any distance from the start state. With sufficient time and memory, Nested MCFS (NMCFS) solves congestion problems in the literature finding better solutions than the state-of-the-art solutions, and it solves N-puzzles without hole near-optimally. The algorithm is anytime and complete. The scalability of the approach is shown for gridsize up to 200 Schedule 200 and up to 400 agents.

Developments on Product Propagation

Abdallah Saffidine et Tristan Cazenave

LAMSADE, Universite Paris-Dauphine

Product Propagation (PP) is an algorithm to backup probabilistic evaluations for abstract two-player games. It was shown that PP could solve go problems as efficiently as Proof Number Search (PNS). In this paper, we exhibit a few domains where, for generic non-optimized versions, PP performs better than previously known algorithms for solving games. The compared approaches include alpha-beta search, PNS, and Monte Carlo Tree Search. We also extend PP to deal with its memory consumption and to improve its solving time.

Lundi 13 janvier 2014, 15h en A711 a Dauphine

Planification dans les jeux video

Alexandre Menif

LAMSADE, Universite Paris-Dauphine

La planification d'action dans les jeux video fait son chemin depuis 2005, ou elle a pour la premiere fois ete employee dans le cadre du jeu de tir F.E.A.R. Apres une introduction a la planification s'action, les differents problemes se posant lors de sa mise en oeuvre au profit d'un jeu video, ainsi que les solutions apportees ici seront etudies a travers les deux principaux types de systemes de planification ayant ete employes, a savoir GOAP (Goal Oriented Action Planning) et la planification HTN (Hierarchical Task Network). Enfin, la combinaison de la planification HTN avec la planification par hierarchies abstractions (ABSTRIPS) sera proposee en vue ameliorer la resolution de ces problemes.