Any adult of a certain age knows the video game Pac-Man! It is a classic “pursuit-evasion” game, where the player maneuvers Pac-Man around a maze to gobble up as many dots as possible before being captured by a ghost. Such pursuit-evasion problems are found in the real world as well, such as applications to missile guidance systems. Unfortunately, learning to optimally coordinate pursuer behaviors so as to minimize time to capture of the evader is challenging because the action space for the pursuers can be quite large and some noisy information exists. Consequently, previous approaches have relied primarily on heuristics. However, we have developed a variant of Thompson Sampling for pursuit-evasion that performs favorably relative to competitors in terms of time-to-capture.

Thompson sampling is a online decision algorithm that balances the **exploration** and **exploitation** of the decision making process based on the Bayesian approach. At each time point in a pursuit-evasion problem, we have the following steps:

** (i)** computing the posterior distribution over the space of possible evader strategies and the posterior distribution of the evader’s location;

**(ii)** sampling a strategy from the posterior distribution; and

**(iii**) using the model-based method in reinforcement learning to estimate the optimal pursuer strategy.

The proposed algorithm performs favorably relative to competitors in terms of time-to-capture in a suite of simulation experiments including pursuit-evasion over a grid and the coordination of ghost behavior in the class arcade game Pac-Man.

In the Pac-Man game implemented in JavaScript, the ghosts (pursuers) could follow one of several pursuit strategies. We devised three strategies for Pac-Man. The first is a pure random walk. The second is defined such that when the distance between Pac-Man and one of the ghosts is less than some value d, the Pac-Man will move in the opposite direction of the closest ghost; otherwise Pac-Man will move uniformly at random. The third is the same as the second except that when the distance between Pac-Man and one of the ghosts is at least d, Pac-Man moves towards the closest Pac-Dot with a probability p and moves randomly otherwise. We compare our method with a search strategy in which the ghosts know Pac-Man’s exact location and select their actions to minimize the distance to Pac-Man. The results show that our method performs slightly better than the benchmark strategy. In the future, we are deriving theoretical results for our algorithm and instead of pursuit-evasion problems, our algorithm can be extended to general multi-agent games and more applications. We expect to see the results soon!

Zhen is a PhD candidate working with Laber Labs. His research interests include optimal treatment regime, machine learning, and Bayesian Inference . We thought this posting was a great excuse to get to know a little more about him, so we we asked:

**Q: Explain what a p-value is, from the perspective of a deranged professor who’s clearly had enough, can’t take it any more, and is having a career-ending meltdown in front of their stoic, uncomprehending students.**

**A:**We want to evaluate the free throw percentage of Jack, who loves playing basketball. We are interested in whether his free throw percentage is over 50%. Thus, we can set the null hypothesis that Jack’s free throw percentage is not greater than 50%. For simplicity, we want him to try two free throws. We consider two cases as follows: (i) Jack successfully puts the two in or (ii) Jack misses both free throws. The p-value is the probability Jack can get the same or better result than his current result when his free throw percentage is 50%. For the case (i), the same or better result than he has made two shots is only he makes the two shots. If his free throw percentage is 50%, then the probability he makes the two shots is 50%x50%=25%. So the p-value in case (i) is 25%. For the case (ii), every shooting result is the same or better than he has missed two shots. So, the probability he gets the same or better result is 100%. So the p-value in case (ii) is 100%. The p-value is used to help us decide if we accept our null hypothesis (i.e. Jack’s free throw percentage is not greater than 50% here). We should always remember a rule: the smaller the p-value is, we are less likely to accept the null hypothesis. Thus in the case (i), as the p-value is 25% (relatively small), we may not like to accept the null hypothesis and we think Jack’s free throw percentage is over 50%. In the case (ii), as the p-value is 100% (very big), we should accept the null hypothesis and we think Jack’s free throw percentage is not greater than 50%. We can see that the use of p-value matches to our intuition to judge.

That’s all. I’m ready to be fired.