- Marek Grzes (University of Waterloo)
- Jesse Hoey (University of Waterloo)
- Scott Sanner (NICTA and the ANU)

- Fighting Wildfire Domain. Zhenyu Yu, Tongji University, China
- Academic Advising Domain. Libby Knouse (Ferland) and Judy Goldsmith, University of Kentucky, USA

- 2014 Probabilistic / Continuous track
- 2014 Deterministic track
- 2014 Learning track
- Past Planning Competitions

- Final results presentation

**Winners of Boolean MDP Track****1st place**-- PROST 2014 (Keller, Geisser)

description

source code**2nd place**-- G-Pack (Kolobov)

description

source code

**Winners of Boolean POMDP Track****1st place**-- POMDPX_NUS (Ye, Wu, Zhang, Hsu, and Lee)

description

source code

competition domains in the POMDPX format**2nd place**-- KAIST_AIPR_LAB (Han, Nam, Hong, Lee, and Kim)

description

- Archive of log files and tabulated results with README description
- Final competition domains/instances: tar.gz or zip
- Evaluation code with README instructions

- Boolean MDP Track:
- G-Pack: Andrew Kolobov (Microsoft Research)

source code

Gourmand is an LRTDP-based planner described in Kolobov, Mausam, Weld,

"LRTDP vs. UCT for Online Probabilistic Planning", AAAI'12. Its IPPC-2014

version fixes a few bugs in the original Gourmand code available

at http://www.cs.washington.edu/ai/planning/gourmand.html, simplifies

convergence detection in the LRTDP's implementation, and does some simple

action pruning. - LRTDP: Leliane Nunes de Barros, Ricardo Hermann, Felipe W. Trevizan,
Karina Valdivia Delgado, University of Sao Paulo, Brazil

- PROST 2014: Thomas Keller and Florian Geisser (University of Freiburg)

source codePROST2014 is implemented in the PROST framework which is

available at http://prost.informatik.uni-freiburg.de. It is based

on the UCT* algorithm as described in Keller and

Helmert, "Trial-based Heuristic Tree Search for Finite Horizon

MDPs", ICAPS '13, and uses the search enhancements unreasonable

action pruning, reward lock detection, and the IDS-based

heuristic (Keller and Eyerich, "PROST: Probabilistic Planning

Based on UCT", ICAPS '12).

The only significant additional enhancement is the used timeout

management system: if UCT* is able to solve the root node before

the total timeout is reached, PROST2014 simulates several runs

internally and determines the best 30 consecutive simulations. It

then executes runs until the achieved result matches (or

outperforms) the simulated result. Simlarly, if time is left

after 30 runs for a reason other than solving the problem (this

is possible due to reward lock states and states with only a

single reasonable action), PROST2014 estimates the total number

of runs it will be able to perform within the total time limit.

It uses that number to determine a good moment to stop by

treating the problem as an instance of the related secretary

problem (i.e., it will execute runs until approximately 37% of

the estimated total number of runs are performed, and then stops

the next time it matches the best result). - PROST 2011: Thomas Keller and Patrick Eyerich (University of Freiburg)
PROST2011 is the UCT-based planner that has participated in

IPPC2011. It is described in Keller and Eyerich, "PROST:

Probabilistic Planning Based on UCT", ICAPS '12. The most

important difference to the IPPC2011 version is that the timeout

management of PROST2014 is used (this was changed due to the fact

that the original timeout management unfortunately does not exist

anymore). Since UCT cannot label nodes as solved, only the second

part (the one on the secretary problem) is relevant. - PPUDD v1 and PPUDD v2: Florent Teichteil-Konigsbuch and Nicolas Drougard (Onera - The
French Aerospace Lab)
Planners PPUDD and ATPPUDD are based on the simple idea that

approximating an MDP with a qualitative model of uncertainty like

possibilities that still preserves rankings of how likely events may

occur, can reduce computation complexity without sacrificing too much

to solution quality. This intuition is emphasized by the observation

that decision diagrams used in factored MDPs often blow up due to the

prohibitive number of nodes generated by probabilistic operations.

Starting with a SPUDD representation of the input problem, PPUDD

translates transition and reward decision diagrams to possibilistic

ADDs, then symbolically solves the possibilistic version of Bellman's

equation. Contrary to SPUDD, PPUDD incrementally augments the planning

horizon while maintaining a BDD mask representing the states reachable

from the initial state in order to restrict the current value function

to the reachable states only. While PPUDD is an offline algorithm,

ATPPUDD is an anytime version which learns computation times of

Bellman backups while dispatching the computational effort accordingly

over the remaining planning horizon much like GOURMAND does in the

probabilistic world. The bottleneck of these possibilistic algorithms

resides on the translation from probabilities to possibilities: a

naive automated translation as used currently for the competition may

lead to poor policies in benchmarks with complex reward structures.

Another issue is the size of the input SPUDD domain whose ADD

instantiation before optimization does not even fit into memory for

many difficult benchmarks. - Boolean POMDP Track:
- KAIST_AIPR_LAB: Sang-Gyu Han, Bowon Nam, Kanghoon Lee, and Kee-Eung Kim (KAIST)

Our planner integrates two algorithms, symbolic heuristic search value

iteration (Symbolic HSVI) [1] and partially observable Monte-Carlo planning

(POMCP) [2].

Symbolic HSVI is an extension of heuristic search value iteration

(HSVI) [3] which aims to handle factored POMDPs. Our Symbolic HSVI is built

on the same ground as the Symbolic HSVI from IPPC 2011,

but we re-implemented it using C++ rather than JAVA.

To handle factored structure such as factored transition and observation

model, we use the CUDD library [4] which is an open source library for

algebraic decision diagram(ADD). Our Symbolic HSVI is designed to handle

finite horizon POMDP models.

POMCP is an online POMDP planner based on Monte-Carlo simulations.

Because it uses Monte-Carlo simulations, POMCP algorithm can handle larger

domains for both tree search and (particle-filter based) belief state update.

For this competition, we use an ensemble strategy where Symbolic HSVI planner

first tries to solve the problem, and if Symbolic HSVI doesn't seem

to complete the calculation on time, then we move onto the POMCP planner,

which is the executed.

References

[1] Sim, Hyeong Seop, Kee-Eung Kim, et al. "Symbolic Heuristic Search Value

Iteration for Factored POMDPs." AAAI. 2008.

[2] Silver, David, and Joel Veness. "Monte-Carlo planning in large POMDPs."

NIPS. 2010.

[3] Smith, Trey, and Reid Simmons. "Heuristic search value iteration for

POMDPs." Proceedings of the 20th conference on AUAI, 2004.

[4] CUDD : http://vlsi.colorado.edu/~fabio/CUDD/ - POMDPX_NUS: Nan Ye, Kegui Wu, Meng Zhang, David Hsu and Wee Sun Lee (N.U.S.)

source code

competition domains in the POMDPX format

DESPOT [1] and POMCP [2] are two state-of-the-art anytime algorithms for

large-scale online POMDP planninng. DESPOT performs heuristic search

on a sparse belief tree, the DESPOT tree, constructed from a set of randomly

sampled scenarios. DESPOT makes use of regularization to avoid overfitting

to the sampled scenarios. POMCP is a Monte-Carlo algorithm, which exploits

UCB-based optimistic action selection strategy to search a belief tree.

Our competition submission leverages both DESPOT and POMCP. POMCP's optimistic

action selection strategy may work well and lead to fast convergence

to an optimal action, but it may also be overly greedy and result in extremely

poor worst-case performance. DESPOT performs heuristic search on a randomly

sampled sparse belief tree, achieving strong performance in practice

and guaranteeing a much better worst-case performance bound. Our submission

runs DESPOT and POMCP each for a few rounds, and completes the remaining

rounds by running the one with better performance.

[1] Adhiraj Somani, Nan Ye, David Hsu, Wee Sun Lee. DESPOT: Online POMDP

Planning with Regularization. NIPS 2013.

[2] D. Silver, and J. Veness. Monte-carlo planning in large POMDPs. NIPS, 2010

- Identical to IPPC
2011 with the following changes:

- We change from a per-trial time limit to a per-instance time limit.
- We disallow manual parameter tuning once the competition has begun.
- To ensure reproducibility of results under competition
constraints, we require a source code release of all competing planners
to the organizers shortly after the competition (a wider release is
optional, but highly encouraged).

- Four domains will be reused from IPPC 2011, four new ones will be introduced for the final competition

- We will use the Amazon Elastic Compute Cloud (EC2)
- You will run your planner on your own personal Linux or Windows
node (your choice) hosted on EC2. You need to
create an Amazon EC2 account using the following
**Amazon EC2 instructions:**pdf. You can launch a micro instance for free where you can test how to connect to the instance, install software, etc. You will receive instructions later on how to start a large instance that will be used for the actual copetition.

- Competitor interest email to organizers: now until Feb 15, 2014
- Warm-up competition: Mar 2014
- Final competition: June 2014