Universite Paris 2
Stockfish est un programme d'echecs open-source, actuellement considere comme la reference en matiere de niveau de jeu. Nous essayerons lors de cette presentation de revenir sur quelques points qui le distinguent des autres logiciels d'echecs :
- ferme de calculs distribuee pour les tests collaboratifs
- methode statistique de validation des ameliorations
- l'algorithme de parallelisation de l'alpha-beta sans synchronisation
- ameliorations recentes de la fonction d'evaluation et de l'elagage.
Miquel Oliu Barton
Ceremade, Universite Paris-Dauphine, PSL
University of New South Wales
We prove that Short Generalized Hex is W-complete parameterized by the number of moves. This solves an open problem from Downey and Fellows' influential list of open problems from 1999. Previously, the problem was thought of as a natural candidate for AW[*]-completeness. Our main tool is a new fragment of first-order logic where universally quantified variables do not occur in inequalities. We show that model-checking on arbitrary relational structures for a formula in this fragment is W-complete when parameterized by formula size.
We then consider Positional Games, a general framework where the game is represented as a hypergraph, two players alternately pick vertices, and the winner is determined based on a player picking all the vertices of some hyperedge. Positional Games are traditionally labelled as Maker-Breaker, Maker-Maker, or Avoider-Enforcer based on the exact details of the winning condition. Our results show that these classes are respectively complete for W, AW[*], and co-W, thereby providing a rough picture of the complexity of Positional Games parameterized by the number of moves. However, some games where the board and the winning configurations are highly structured are fixed-parameter tracable (FPT). In particular, Short k-Connect, which generalizes Tictactoe and Gomoku, as well as Short Hex are FPT.
This talk is based on joint work with Edouard Bonnet and Florian Jamain appearing in the journal "Theoretical Computer Science" as well as ongoing work with Serge Gaspers, Antonin Lambilliotte, and Stefan Rummele.
Abdallah Saffidine is an ARC DECRA Fellow and a Research Associate at the University of New South Wales, Sydney, Australia. He works in the Artificial Intelligence and Algorithms groups. He arrived at UNSW in 2013 as a postdoc with Pr. Michael Thielscher and obtained his PhD from the Universite Paris-Dauphine, France, under the supervision of Pr. Tristan Cazenave. Abdallah has a wide range of interests from games, planning, and other areas of decision-making to logic, complexity and other areas of theory.
University of New South Wales
Abstract: It was demonstrated in 2009 that the TreeStrap algorithm could learn Chess to Master level from random initial weights. However, the learning rate had to be carefully tuned and learning sometimes became unstable. This talk will give a historical overview of game tree learning, and then discuss modifications to the TreeStrap algorithm including staged learning, incremental evaluation and scaling of learning rate by depth, which make the algorithm more stable and allow it to be scaled up to a multi-player Chess variant board game called Duchess.
Bio: Alan Blair completed a PhD in Mathematics at MIT and worked at Brandeis, Queensland and Melbourne before taking up his current position at UNSW. In addition to TreeStrap, he is known for his pioneering work in the comparison of evolution with gradient descent for Backgammon, Tron and Simulated Hockey, as well as his early attempts to train parallel convolutional neural networks for the game of Go. His non-game interests include robot navigation, language processing, hierarchical evolutionary computation and evolutionary deep learning.
PSL - Universite Paris-Dauphine - LAMSADE
Monte Carlo Tree Search (MCTS) is the state of the art algorithm for General Game Playing (GGP). We propose to learn a playout policy online so as to improve MCTS for GGP. We also propose to learn a policy not only using the moves but also according to the features of the moves. We test the resulting algorithms named Playout Policy Adaptation (PPA) and Playout Policy Adaptation with move Features (PPAF) on Atarigo, Breakthrough, Misere Breakthrough, Domineering, Misere Domineering, Knightthrough, Misere Knightthrough and Nogo. The experiments compare PPA and PPAF to Upper Confidence for Trees (UCT) and to the closely related Move-Average Sampling Technique (MAST) algorithm.
Monte Carlo Tree Search (MCTS) is a general search algorithm that has improved the state of the art for multiple games and optimization problems. A record breaking algorithm for puzzles and optimization is Nested Rollout Policy Adaptation (NRPA). It learns a playout policy online that dynamically adapts the playouts to the problem at hand. We propose to enhance NRPA using more selectivity in the playouts. The idea is applied to three different problems: Bus regulation, SameGame and Weak Schur numbers. We improve on standard NRPA for all three problems.
In this paper we experiment with a Deep Convolutional Neural Network for the game of Go. We show that even if it leads to strong play, it has weaknesses at tactical search. We propose to combine tactical search with Deep Learning to improve Golois, the resulting Go program. A related work is AlphaGo, it combines tactical search with Deep Learning giving as input to the network the results of ladders. We propose to extend this further to other kind of tactical search such as life and death search.
University of Bremen
In this talk I review recent advances of randomized AI search in solving industrially relevant optimization problems. The method I focus on is a sampling-based solution mechanism called Monte-Carlo Tree Search (MCTS) and establish a better trade-off between exploitation and exploration.
This method, originating in game playing research, is a general heuristic search technique, for which often less problem-specific knowledge has to be added than in comparable approaches.
The production and logistics problems I consider include vehicle routing, container packing and robot motion planning tasks, which are usually considered to be connected to Operations Research, often referring to decades of research. Despite the long history of the problems addressed, NRPA, a recent variant of MCTS with nestedness and policy adaptation, computes promising results, sometimes even of a quality that is difficult to match with other global/local optimization strategies.
The breadth of the three different application domains illustrates that a strong search method is entering the OR arena, which might eventually lead to a paradigm shift.
J.-B. Hoock and O. Teytaud
TAO (Inria), LRI, UMR 8623(CNRS - Univ. Paris-Sud), bat 490 Univ. Paris-Sud 91405 Orsay, France, firstname.lastname@example.org
Genetic Programming (GP) is the automatic building of programs for solving a given task. The main troubles are (i) the price of the evaluation of a mutation (ii) the size of the huge set of possible mutations. In many cases, one more trouble comes from the noisy evaluation; the fitness function, evaluating the quality of a program, is stochastic. When evaluating a program, one spends a time (supposed constant) for running it, and gets a noisy reward: we are looking for
argmin E fitness(x). x in possible programs
When mining a huge set of mutations in an uncertain framework (noisy optimization), there are two main issues:
- load balancing: which mutations of the program are to be tested now ?
-statistical validation: which mutations of the program should be validated ?
The statistical validation is not trivial; whenever each mutation is tested rigorously, e.g. with confidence 95 % (i.e. 5% of probability of error), we will have erroneous validations very frequently if millions of mutations are tested (with just 100 mutations, each of them being useless, we have a probability 99.4% (= 1 - (1 - 0.95)100 ) of erroneous validation). This is known as the Multiple Simultaneous Hypothesis Testing (MSHT) effect.
Bandits have been used for a while, even in GP, for the load balancing problem, but not yet, to the best of our knowledge, for both issues simultaneously. We here investigate the use of racing algorithms, a particular form of bandit algorithms covering simultaneously the load balancing and the statistical validation. Our algorithm, termed RBGP (racing-based GP) depending on parameters delta and c, such that the following two properties hold simultaneously with probability 1 - delta: Monotonic improvement:
E fitness(xn ) > E fitness(xn+1 );
and (efficiency) if there exists a possible mutation of x with improvement at least c, then xn+1 contains a mutation with positive effect that was not in xn . We applied our results to the GP of the bias module of MCTS, a well-known problem which is difficult even for humans; whereas humans often do mistakes due to the multiplication of trials (the MSHT effect above), RBGP provided improvements comparable to humans developers, and more importantly was never convinced of sending a mistake in the CVS archive of the project.
LIASD, Universite Paris 8
Nous presentons l'etat de nos travaux sur la decomposition des descriptions de jeux en Game Description Language (GDL). Dans le domaine du General Game Playing, l'exploration des jeux decrits en GDL peut etre simplifiee significativement par la decomposition du probleme en sous-problemes analysables separement. Nous avons elabore une technique qui permet de decomposer des descriptions de jeux avec importe quel nombre de joueurs en traitant le probleme des actions simultanees. Cette approche permet identifier les sous-jeux parfaitement separables, mais egalement de decomposer les jeux constitues de deux sous-jeux en serie ou les jeux a actions composees, en evitant, contrairement aux methodes precedentes, de s'appuyer sur des elements syntaxiques qui peuvent etre elimines par une simple reecriture des regles GDL. Nous presentons les resultats obtenus sur un panel de 40 jeux. Nous identifions des questions nouvelles et difficiles posees par le probleme de la decomposition et esquissons des pistes pour y repondre.
TAO, INRIA Saclay-CNRS-LRI, Universite Paris-Sud
Nous montrons l'impact du choix de la graine aleatoire initiale dans les performances d'un MCTS. L'approche est testee sur Domineering, Go, Breakthrough, Atari-Go. Nous montrons ensuite comment l'approche peut etre etendue a des graines specifiques d'une position donnee.
In normal version of combinatorial game theory, all games are invertible, whereas only the empty game is invertible in misere version. For this reason, several restricted universes were earlier considered for their study, in which more games are invertible. We here study combinatorial games in misere version, in particular universes where no player would like to pass their turn. In these universes, we prove that having one extra condition makes all games become invertible. We then focus our attention on a specic quotient and show that the set of universes sharing that quotient is closed under sum.
Centre de recherche des ecoles de Saint-Cyr Coetquidan
Cette presentation proposera une serie de criteres ayant pour objectif de pouvoir comparer la planification utilisee comme composant de decision d'action pour les personnages non joueurs des jeux-videos. Ces criteres ont ete concus a partir de donnees industrielles et defendent l'idee d'un "Behavioral Analytics" tout comme il existe un "Game Analytics". Apres une rapide presentation historique, de la collecte et de l'analyse des donnees recueillies, des resultats comparatifs de performances seront presentes en tentant de faire le parallele avec l'analyse du comportement des joueurs et en decrivant ce que l'on peut esperer gagner a systematiser ces criteres.
LAMSADE, Universite Paris-Dauphine
Generalized Rapid Action Value Estimation
Monte Carlo Tree Search (MCTS) is the state of the art algorithm for many games including the game of Go and General Game Playing (GGP). The standard algorithm for MCTS is Upper Confidence bounds applied to Trees (UCT). For games such as Go a big improvement over UCT is the Rapid Action Value Estimation (RAVE) heuristic. We propose to generalize the RAVE heuristic so as to have more accurate estimates near the leaves. We test the resulting algorithm named GRAVE for Atarigo, Knighthrough, Domineering and Go.
Playout Policy Adaptation for Games
Monte Carlo Tree Search (MCTS) is the state of the art algorithm for General Game Playing (GGP). We propose to learn a playout policy online so as to improve MCTS for GGP. We test the resulting algorithm named Playout Policy Adaptation (PPA) on Atarigo, Breakthrough, Misere Breakthrough, Domineering, Misere Domineering, Go, Knightthrough, Misere Knightthrough, Nogo and Misere Nogo. For most of these games, PPA is better than UCT with a uniform random playout policy, with the notable exceptions of Go and Nogo.
LAMSADE, Universite Paris-Dauphine
Le contexte de ces travaux est l'utilisation de la planification HTN (Hierarchical Task Network) afin de generer en temps-reel des plans d'actions adaptes aux manoeuvres de combat de petite unites d'infanterie (environ une trentaine d'agents) dans le cadre de la simulation (type jeux video). Les agents etant organises selon plusieurs niveaux d'abstraction (correspondant a la chaine de commandement militaire), la question qui se pose est : est-il possible d'ameliorer les performances du planificateur en s'appuyant sur cette hierarchie ? Apres avoir introduit la planification HTN, une semantique fondee sur l'abstraction de l'espace des etats est proposee afin de definir des operateurs de haut niveau abstraits. Enfin, un algorithme exploitant ces operateurs abstraits est presente au cote des premiers resultats obtenus pour deux domaines de planification.
LIPADE, Universite Paris-Descartes
In this talk, we present a new method to compute explicit solutions to the weak Schur problem, and consequently obtain lower bounds to weak Schur numbers WS(K) for K <= 8. Our method is recursive, based on the principle that good solutions to the K+1 column problem are extensions of good solutions to the K column problem. We make several observations on the regularities of good solutions: (1) the last column is filled with nearly consecutive numbers. (2) The beginning of a column may contain holes, but holes in the middle of the sequence hinder the solution. We test an abstract recursive function with refinements based on these observations. We define and implement an abstract simulation usable by Monte-Carlo search. Using such tools, we found the existing lower bounds for K <= 6, and we discovered new bounds: WS(7) >= 1736 and WS(8) >= 5105.
School of Information Science JAIST (Japan Advanced Institute of Science and Technology)
The level of programs for the game of Go has been continuously increasing these last 10 years, reaching now a strong amateur level. First, we present a review of current Go programs and competitions. Then, we describe the methods used in top Go programs, with an emphasis on the role of knowledge extraction and how it is combined with Monte-Carlo Tree Search. Knowledge is considered here as all the information that can be acquired about the game, either offline with machine learning on professional game records, or online, from the Monte-Carlo simulations. With examples from our Go program Nomitan, we show how knowledge extraction is a key factor in the strength of programs. Finally, we detail some of the problems that still need to be solved in order to reach professional strength.
Joint work with Kokolo Ikeda (JAIST, Associate Professor)
Le Probleme du Voyageur de Commerce (TSP) consiste a trouver, parmi n villes, un chemin de longueur totale minimale qui passe exactement une fois par chaque ville et revienne au point de depart.Il s'agit d'un probleme NP-difficile, tres etudie en optimisation combinatoire. Ici, nous nous interessons a la version multiobjectif du probleme, dans laquelle on prend en compte plusieurs criteres potentiellement conflictuels. Dans ce contexte il est usage de determiner ensemble des points non-domines (ou front de Pareto). Des lors que la taille de instance et/ou le nombre de criteres augmentent, il s'avere impossible de recourir a des methodes exactes. Nous proposons une metaheuristique combinant 3 outils d'optimisation dans le but d'approcher le Front de Pareto de differentes instances de TSP multiobjectif. Le premier outil est l'algorithme de Lin-Kernighan, qui va nous permettre de resoudre efficacement des instances mono-objectif du TSP. Le deuxieme est la regle de Metropolis, utilisee dans le recuit simule mono-objectif standard afin d'orienter efficacement la recherche. Et enfin l'epsilon dominance de Pareto, pour eviter une explosion du nombre de solutions efficaces memorisees durant la recherche.
Universite Paris 8
In this presentation, we will give a report on the 2014 General Game Playing Competition surveying the competitors and the games and we will describe LeJoueur in more detail. General game players are computer programs capable of playing a large variety of games given only a description of the rules. Every year, a competition is organized to compare the relative strength of general game players from all over the world. This year, the competition focused on decomposable games. LeJoueur, a general game player developed at Universite Paris 8 has reached the finals of this year's competition. It relies on Propositional Nets to rapidly interpret the game rules and uses a parallelization of UCT as a decision method. Propositional Nets, which were used by only one player in previous competitions, has been broadly adopted this year, producing a general improvement of player strength.
University of New South Wales
General game players can drastically reduce the cost of search if they are able to solve smaller subproblems individually and synthesise the resulting solutions. To provide a systematic solution to this (de-)composition problem, we start off with generalising the standard decomposition problem in planning by allowing the composition of individual solutions to be further constrained by domain-dependent requirements of the global planning problem. We solve this generalised problem based on a systematic analysis of composition operators for transition systems, and we demonstrate how this solution can be further generalised to general game playing.
joint work with Timothy Cerexhe, David Rajaratnam, and Michael Thielscher
Universite Lyon 1
Monte Carlo Tree Search (MCTS) fait a present partie des references de l'IA pour les jeux. Depuis son introduction pour le jeu de Go, MCTS a ete applique avec succes a d'autres jeux (Hex, GGP) et a ouvert la voie a toute une famille de nouvelles methodes (Multiple-MCTS, Nested-MCTS) ... A instar des methodes dites de Monte Carlo, MCTS evalue la victoire globale d'une situation de jeu a partir de milliers de fins de parties generees aleatoirement. A mesure que les simulations progressent, le programme se construit une representation de la situation courante et oriente sa recherche vers les coups les plus prometteurs. Cependant, face a des situations critiques (ou chaudes), le programme peine a correctement identifier les coups importants et disperse alors sa recherche. Au cours de cette presentation, je montrerai comment des evaluations locales permettent d'abstraire l'exploration de l'espace de recherche et offrent la possibilite, a terme, d'ameliorer le processus d'apprentissage en recentrant la recherche autour d'objectifs intermediaires.
Monte Carlo Tree Search (MCTS) has improved the performance of game-playing engines in domains such as Go, Hex, and general-game playing. MCTS has been shown to outperform outperform classic alpha-beta search in games where good heuristic evaluations are difficult to obtain. In recent years, combining ideas from traditional minimax search in MCTS has been shown to be advantageous in some domains, such as Lines of Action, Amazons, and Breakthrough. In this paper, we propose a new way to use heuristic evaluations to guide the MCTS search by storing the two sources of information, estimated win rates and heuristic evaluations, separately. Rather than using the heuristic evaluations to replace the playouts, our technique backs them up implicitly during its MCTS simulations. These learned evaluation values are then used to guide future simulations. Compared to current techniques, we show that using implicit minimax backups leads to stronger play performance in Breakthrough, Lines of Action, and Kalah.
Singapore Management University
With rapid growth of transportation demands in urban cities, one common challenge of city planners is to provide efficient and effective connection service to travelers using public transportation system (from public transport hubs to final destinations). This is commonly known as the last-mile problem, and is critical in promoting the utilization of public transportation system. In this paper, we address the last-mile problem by considering a dynamic and demand responsive mechanism for arranging ride sharing on a non-dedicated commercial fleet (such as taxis or passenger vans). Our approach has the benefits of being dynamic, flexible, and with low setup cost. A critical issue in such ride-sharing service is how riders should be grouped and serviced, and how fares should be split. To facilitate our designed approach, we propose two auction designs which are used to solicit individual rider's willing payment rate and compensation rate (for extra travel, if any). We demonstrate that these two auctions are budget balanced, individually rational, and incentive compatible. A series of experimental studies based on both synthetic and real-world datasets are designed to demonstrate the pros and cons of our two proposed auction mechanisms in various settings.
Nikhef and Tilburg University
MCTS has proven to be a good candidate for finding Horner schemes (a list that defines in what order variables should be pulled outside brackets). These schemes are useful to simplify expressions, which in turn speeds up numerical integration significantly. For the effectiveness of a Horner scheme, it is important that the entire scheme is optimized and not just the beginning. However, the well-known UCT child selection has the disadvantage that it does little to optimize deep in the tree (and thus little at the end of the scheme), and that the final tree updates are wasted on exploration. Inspired by simulated annealing, we introduce a dynamic exploration-exploitation parameter T that decreases with the iteration number. Thus the first iterations are used for exploration, whereas the focus gradually shifts to exploitation. The role of T is similar to that of the temperature in simulated annealing. Using this modified UCT, which we call SA-UCT, we obtain better results for our domain.
Les orateurs sont les suivants :
LIPADE, Universite Paris Descartes, FRANCE
This paper presents Monte-Carlo Fork Search (MCFS), a new algorithm that solves Cooperative Path-Finding (CPF) problems with simultaneity. The background is Monte-Carlo Tree Search (MCTS) and Nested Monte-Carlo Search (NMCS). Concerning CPF, MCFS avoids to enter into the curse of the very high branching factor. Regarding MCTS, the key idea of MCFS is to build a tree balanced over the whole game tree. To do so, after a simulation, MCFS stores the whole sequence of actions in the tree, which enables MCFS to fork new sequences at any depth in the built tree. This idea fits CPF problems in which the branching factor is too large for MCTS or A* approaches, and in which congestion may arise at any distance from the start state. With sufficient time and memory, Nested MCFS (NMCFS) solves congestion problems in the literature finding better solutions than the state-of-the-art solutions, and it solves N-puzzles without hole near-optimally. The algorithm is anytime and complete. The scalability of the approach is shown for gridsize up to 200 Schedule 200 and up to 400 agents.
Abdallah Saffidine et Tristan Cazenave
LAMSADE, Universite Paris-Dauphine
Product Propagation (PP) is an algorithm to backup probabilistic evaluations for abstract two-player games. It was shown that PP could solve go problems as efficiently as Proof Number Search (PNS). In this paper, we exhibit a few domains where, for generic non-optimized versions, PP performs better than previously known algorithms for solving games. The compared approaches include alpha-beta search, PNS, and Monte Carlo Tree Search. We also extend PP to deal with its memory consumption and to improve its solving time.
LAMSADE, Universite Paris-Dauphine
La planification d'action dans les jeux video fait son chemin depuis 2005, ou elle a pour la premiere fois ete employee dans le cadre du jeu de tir F.E.A.R. Apres une introduction a la planification s'action, les differents problemes se posant lors de sa mise en oeuvre au profit d'un jeu video, ainsi que les solutions apportees ici seront etudies a travers les deux principaux types de systemes de planification ayant ete employes, a savoir GOAP (Goal Oriented Action Planning) et la planification HTN (Hierarchical Task Network). Enfin, la combinaison de la planification HTN avec la planification par hierarchies abstractions (ABSTRIPS) sera proposee en vue ameliorer la resolution de ces problemes.