Category Archives: 2015

Real-world robots often face a stochastic and partially observable environment. POMDP planning offers a principled approach to handle such uncertainties. However, POMDPs are also well-known for their computational complexity that grows exponentially with the problem scale and the planning horizon, namely, the “curse of dimension” and the “curse of history”. We aim to solve large-scale POMDP planning efficiently and scale-up to complex real-world tasks. The key to our solution includes: 1) one can leverage massive parallelization and powerful hardware to mitigate the “curse of dimension”, and 2) one can integrate POMDP planning with learning to overcome the “curse of history”.

A perfect example of large-scale real-world tasks is Crowd-Driving: driving among an unregulated crowd of heterogeneous traffic agents. In crowd-driving, the robot vehicle must contend with a large-scale and highly interactive environment. Such complex environments require sophisticated long-term planning to achieve human-level performance. However, efficient planning is often prohibited due to real-time constraints. Shown below is a 3-mins talk introducing the crowd-driving problem and discussing how the problem can be modeled and solved using a combination of planning and learning. This video also summarizes the core technical approaches to be introduced in this post:


Panpan Cai, Yuanfu Luo, David Hsu, and Wee Sun Lee. HyP-DESPOT: A hybrid parallel algorithm for online planning under uncertainty. In Proc. Robotics: Science & Systems, 2018. [PDF]

Panpan Cai, Yuanfu Luo, David Hsu, and Wee Sun Lee.  HyP-DESPOT: A Hybrid Parallel Algorithm for Online Planning under Uncertainty. The International Journal of Robotics Research, 2020. [PDF][Code]

Hybrid Parallel DESPOT (HyP-DESPOT) is a massively parallel belief tree search algorithm that leverages both CPU and GPU parallelization in order to achieve real-time planning performance for complex tasks with large state, action, and observation spaces. In multi-core CPUs, HyP-DESPOT performs parallel DESPOT tree search by simultaneously traversing multiple independent paths; In the GPU, HyP-DESPOT performs parallel Monte Carlo simulations at the leaf nodes of the search tree. HyP-DESPOT provably converges in finite time under moderate conditions and guarantees near-optimality of the solution. In practice, HyP-DESPOT speeds up online planning by up to a factor of several hundred in several challenging robotic tasks.


Panpan Cai, Yuanfu Luo, Aseem Saxena, David Hsu, and Wee Sun Lee. LeTS-Drive: Driving in a Crowd by Learning from Tree Search. In Proc. Robotics: Science & Systems, 2019. [PDF]

LeTS-Drive is a crowd-driving algorithm that integrates online POMDP planning with deep learning. The core idea is to constrain belief tree search to short-term futures and use learning for long-term futures. It consists of two phases. In the offline phase, we learn a policy and the corresponding value function by imitating the belief tree search expert. In the online phase, LeTS-Drive uses the learned policy and value function to inform and guide the online belief tree search. LeTS-Drive leverages the robustness of planning and the runtime efficiency of learning to enhance the performance of both. By integrating planning and learning, LeTS-Drive outperforms either planning or imitation learning alone and develops sophisticated crowd-driving skills.


Panpan Cai, David Hsu, Think Locally, Learn Globally. to be uploaded to Arxiv.

LeTS-Drive-Auto advances LeTS-Drive by integrating planning and learning in a close-loop. Similar to LeTS-Drive, it learns a policy network and a value network as representations of global priors. Differently, LeTS-Drive-Auto builds mutual communication between the planner and the learner by 1) guiding the online planner using learned global priors; 2) learning the global priors from data sent back from the guided planner. LeTS-Drive-Auto is a new reinforcement learning algorithm that the planner serves as the policy improvement operator and evolves together with the learner. By integrating planning and learning in a close-loop, LeTS-Drive-Auto achieves superior driving performance in crowded urban environments in simulation, outperforming either planning or learning alone, and largely exceeds the capability of open-loop integration (LeTS-Drive).

We aim to model, predict, and simulate the motion of unregulated crowds of traffic participants. We take a geometric approach and model traffic agent motion as a constrained optimization problem in the velocity space of agents. The objective of the problem is to maximize the cooperative navigation efficiency for all agents, while the constraints are induced by collision avoidance, vehicle kinematics, and the road context. We have developed two motion models for two typical scenarios, including crowds of pedestrians in open spaces and heterogenous traffic crowds on urban roads. Making use of these models, a planning system can perform more accurate long-term predictions and thus enhance the planning performance. We have further released an open-source simulator that uses these models to simulate crowd-driving in real-world urban maps. We envision the simulator to facilitate training, testing, and development of crowd-driving algorithms.


Yuanfu Luo, Panpan Cai, Aniket Bera, David Hsu, Wee Sun Lee, and Dinesh Manocha. PORCA: Modeling and planning for autonomous driving among many pedestrians. IEEE Robotics and Automation Letters, 2018. [PDF]

This paper presents a planning system for autonomous driving among many pedestrians. A key ingredient of our approach is PORCA, a pedestrian motion prediction model that accounts for both a pedestrian’s global navigation intention and local interactions with the vehicle and other pedestrians. Since the autonomous vehicle does not know the pedestrian’s intention a priori, it requires a planning algorithm to reason about the uncertainty on pedestrian intentions. Our planning system combines a POMDP algorithm with the pedestrian motion model and runs in near real-time. Experiments show that it enables a robot vehicle to drive safely, efficiently, and smoothly among a dense crowd of pedestrians.


Yuanfu Luo, Panpan Cai, David Hsu, and Wee Sun Lee. GAMMA: A general agent motion prediction model for autonomous driving. Arxiv, 2019. [PDF][Code]

Urban environments usually contain mixed traffic of heterogeneous agents such as pedestrians, bicycles, cars, buses, etc.. In such environments, traffic motion prediction becomes extremely challenging because of the diverse dynamics and geometry of traffic agents, complex road conditions, and intensive interactions among the agents. In this paper, we proposed GAMMA, a general agent motion prediction model for autonomous driving, that can predict the motion of heterogeneous traffic agents with different kinematics, geometry, human agents’ inner states, etc.. GAMMA formalizes motion prediction as geometric optimization in the velocity space, and integrates physical constraints and human inner states into this unified framework. Our results show that GAMMA outperforms both geometric and learning-based approaches significantly on diverse real-world datasets.


Cunjun Yu*, Xiao Ma*, Jiawei Ren, Haiyu Zhao, Shuai Yi. Spatio-Temporal Grpah Transformer Networks for Pedestrian Trajectory Prediction. in Proc. European Conf. on Computer Vision, 2020. [PDF][Code][Project]

Understanding crowd motion dynamics is critical to real-world applications, e.g., surveillance systems and autonomous driving. This is challenging because it requires effectively modeling the socially aware crowd spatial interaction and complex temporal dependencies. We believe attention is the most important factor for trajectory prediction. In this paper, we present STAR, a Spatio-Temporal grAph tRansformer framework, which tackles trajectory prediction by only attention mechanisms. STAR models intra-graph crowd interaction by TGConv, a novel Transformer-based graph convolution mechanism. The inter-graph temporal dependencies are modeled by separate temporal Transformers. STAR captures complex spatio-temporal interactions by interleaving between spatial and temporal Transformers. To calibrate the temporal prediction for the long-lasting effect of disappeared pedestrians, we introduce a read-writable external memory module, consistently being updated by the temporal Transformer. We show that with only attention mechanism, STAR achieves state-of-the-art performance on 5 commonly used real-world pedestrian prediction datasets.


Panpan Cai*, Yiyuan Lee*, Yuanfu Luo, David Hsu. SUMMIT: A Simulator for Urban Driving in Massive Mixed Traffic. in Proc. IEEE Int. Conf. on Robotics & Automation, 2020. [PDF][Code]

This paper presents SUMMIT, a high-fidelity simulator that facilitates the development and testing of crowd-driving algorithms. By leveraging the open-source OpenStreetMap map database and the GAMMA motion model developed in our earlier work, SUMMIT simulates dense, unregulated urban traffic for heterogeneous agents at any worldwide location that OpenStreetMap supports. SUMMIT is built as an extension of CARLA and inherits from it the physics and visual realism for autonomous driving simulation. SUMMIT supports a wide range of applications, including perception, vehicle control and planning, and end-to-end learning. We provide real-world benchmark scenes to show that SUMMIT generates complex, realistic traffic behaviors in challenging crowd-driving settings. The simulator also comes with a context-aware POMDP planner as a driving expert and a reference to future crowd-driving algorithms.


We aim at robust reinforcement learning (RL) for tasks with complex partial observations. While existing RL algorithms have achieved great success in simulated environments, such as Atari games, Go, and even Dota, generalizing them to realistic environments with complex partial observations remains challenging. The key to our approach is to learn from the partial observations a robust and compact latent state representation. Specifically, we handle partial observability by combining the particle filter algorithm with recurrent neural networks. We tackle complex observations through efficient discriminative model learning, which focuses on learning observations required for action selection rather the entire high-dimensional observation space.

Particle Filter Recurrent Neural Networks

X. Ma, P. Karkus, D. Hsu, and W.S. Lee. Particle filter recurrent neural networks. In Proc. AAAI Conf. on Artificial Intelligence, 2020. [PDF][Code]

Recurrent neural networks (RNNs) have been extraordinarily successful for prediction with sequential data. To tackle highly variable and noisy real-world data, we introduce Particle Filter Recurrent Neural Networks (PF-RNNs), a new RNN family that explicitly models uncertainty in its internal structure: while an RNN relies on a long, deterministic latent state vector, a PF-RNN maintains a latent state distribution, approximated as a set of particles. For effective learning, we provide a fully differentiable particle filter algorithm that updates the PF-RNN latent state distribution according to the Bayes rule. Experiments demonstrate that the proposed PF-RNNs outperform the corresponding standard gated RNNs on a synthetic robot localization dataset and 10 real-world sequence prediction datasets for text classification, stock price prediction, etc.

Discriminative Particle Filter Reinforcement Learning

X. Ma, P. Karkus, D. Hsu, W.S. Lee, and N. Ye. Discriminative particle filter reinforcement learning for complex partial observations. In Proc. Int. Conf. on Learning Representations, 2020[PDF][Project][Code]

Real-world decision making often requires reasoning in a partially observable environment using information obtained from complex visual observations — major challenges for deep reinforcement learning. In this paper, we introduce the Discriminative Particle Filter Reinforcement Learning (DPFRL), a reinforcement learning method that encodes a particle filter structure with learned discriminative transition and observation models in a neural network. The particle filter structure allows for reasoning with partial observations, and discriminative parameterization allows modeling only the information in the complex observations that are relevant for decision making. In experiments, we show that in most cases DPFRL outperforms state-of-the-art POMDP RL models in Flickering Atari Games, an existing POMDP RL benchmark, as well as in Natural Flickering Atari Games, a new, more challenging POMDP RL benchmark that we introduce. We also show that DPFRL performs well when applied to a visual navigation domain with real-world data.

Contrastive Variational Reinforcement Learning

X. Ma, S. Chen, D. Hsu and W.S. Lee. Contrastive variational reinforcement learning for complex observations. In Proc. 4th Conf. on Robot Learning, 2020. [PDF][Project][Code]

Deep reinforcement learning (DRL) has achieved significant success in various robot tasks: manipulation, navigation, etc. However, complex visual observations in natural environments remains a major challenge. This paper presents Contrastive Variational Reinforcement Learning (CVRL), a model-based method that tackles complex visual observations in DRL. CVRL learns a contrastive variational model by maximizing the mutual information between latent states and observations discriminatively, through contrastive learning. It avoids modeling the complex observation space unnecessarily, as the commonly used generative observation model often does, and is significantly more robust. CVRL achieves comparable performance with state-of-the-art model-based DRL methods on standard Mujoco tasks. It significantly outperforms them on Natural Mujoco tasks and a robot box-pushing task with complex observations, e.g., dynamic shadows.

Natural Mujoco Tasks


Object manipulation in unstructured environments like homes and offices requires decision making under uncertainty. We want to investigate how we can do general, fast and robust object manipulation under uncertainty in a principled manner.

Learning To Grasp Under Uncertainty Using POMDPs

N. P. Garg, D. Hsu, and W. S. Lee. Learning To Grasp Under Uncertainty Using POMDPs. In Proc. IEEE Int. Conf. on Robotics & Automation 2019.

Robust object grasping under uncertainty is an essential capability of service robots. Many existing approaches rely on far-field sensors, such as cameras, to compute a grasp pose and perform open-loop grasp after placing gripper under the pose. This often fails as a result of sensing or environment uncertainty. This paper presents a principled, general, and efficient approach to adaptive grasping, using both tactile and visual sensing as feedback. We first model adaptive grasping as a partially observable Markov decision process (POMDP), which handles uncertainty naturally. We solve the POMDP for sampled objects from a set, in order to generate data for learning. Finally, we train a grasp policy, represented as a deep recurrent neural network (RNN), in simulation through imitation learning. By combining model-based POMDP planning and imitation learning, the proposed approach achieves robustness under uncertainty, generalization over many objects, and fast execution. In particular, we show that modeling only a small sample of objects enables us to learn a robust strategy to grasp previously unseen objects of varying shapes and recover from failure over multiple steps. Experiments on the G3DB object dataset in simulation and a smaller object set with a real robot indicate promising results.

Push-Net: Deep Planar Pushing for Objects with Unknown Physical Properties

J.K. Li, D. Hsu, and W.S. Lee. Push-Net: Deep planar pushing for objects with unknown physical properties. In Proc. Robotics: Science & Systems, 2018.

We introduce Push-Net, a deep recurrent neural network model, which enables a robot to push objects of unknown physical properties for re-positioning and re-orientation, using only visual camera images as input. The unknown physical properties is a major challenge for pushing. Push-Net overcomes the challenge by tracking a history of push interactions with an LSTM module and training an auxiliary objective function that estimates an object’s center of mass. We trained Push-Net entirely in simulation and tested it extensively on many different objects in both simulation and on two real robots, a Fetch arm and a Kinova MICO arm. Experiments suggest that Push-Net is robust and efficient. It achieved over 97% success rate in simulation on average and succeeded in all real robot experiments with a small number of pushes.

The code for Push-Net is available in github.


[Thesis 2019] Act to See and See to Act: A Robotic System for Object Retrieval in Clutter — Li Juekun

Object retrieval in clutter is an extraordinary challenge for robots. The challenges come from the incomplete knowledge of the environment. A robot has imperfect sensing due to occlusion among objects. At the same time, it must physically interact with objects of unknown physical properties.

We hypothesize that humans adopt the strategy of Act to See and See to Act to retrieve objects in clutter. We may rearrange (Act) objects to better understand (See) the scene, which in turn guides us to select better actions (Act) towards achieving the goal. This thesis adopts the same strategy that enables a robotic system to robustly and efficiently retrieve objects in clutter under uncertainties in sensing due to occlusion and uncertainties in control due to unknown objects’ physical properties, such as center of mass.

To alleviate uncertainties in sensing, we formulate the problem of object search in clutter as a Partially Observable Markov Decision Process (POMDP) with large state, action and observation spaces. With insights in spatial constraints of the problem, we improve the state-of-the-art POMDP solver, DEterminized Sparse Partially Observable Tree (DESPOT), to solve the POMDP efficiently. Through experiments in simulation, we show that the proposed planner was able to select actions to remove occlusion and reveal the target object efficiently. We further conclude that POMDP planning is effective for problems which require multi-step lookahead search.

To handle uncertainties in control, we devise Push-Net, a deep recurrent neural
network, which enables a robot to push an object from one configuration to another
robustly and efficiently. Capturing history of push interactions enables Push-Net
to push novel objects robustly. We perform experiments in simulation and on a
real robot, and show that embedding physical understanding (center of mass) about
objects in Push-Net helps select more effective push actions.

Finally, we improve and integrate both the POMDP planner and the Push-Net
into a real robotic system. We evaluate the system on a set of challenging scenarios.
The results demonstrate that the proposed system is able to retrieve the target object
robustly and efficiently in clutter. The success of the system is attributed to 1)
the ability to handle perceptual uncertainty due to occlusion; 2) the ability to push
objects of unknown physical properties in clutter; 3) the ability to perform multi-step
lookahead planning for efficient object search in complex environment.


We aim to combine the strengths of modal-based algorithmic reasoning and model-free deep learning for robust robot decision-making under uncertainty. Key to our approach is a unified neural network policy representation, encoding both a learned system model and an algorithm that solves the model. The network is fully differentiable and can be trained end-to-end, circumventing the difficulties of direct model learning in a partially observable setting. In contrast with conventional deep neural networks, our network representation imposes the model and algorithmic priors on the neural network architecture for improved generalization of the learned policy.


P. Karkus, D. Hsu, and W. S. Lee. QMDP-Net: Deep Learning for Planning Under Partial Observability. In Advances in Neural Information Processing Systems, NeurIPS, 2017. [code]

QMDP-net employs algorithm priors on a neural network for planning under partial observability. The network encodes a learned POMDP model together with QMDP, a simple, approximate POMDP planner, thus embedding the solution structure of planning in a network learning architecture. We train a QMDP-net on different tasks so that it can generalize to new ones in the parameterized task set, and “transfer” to other similar tasks beyond the set. Interestingly, while QMDP-net encodes the QMDP algorithm, it sometimes outperforms the QMDP algorithm in the experiments, as a result of end-to-end learning.

Navigation Network (Nav-Net)

P. Karkus, D. Hsu, and W. S. Lee. Integrating Algorithmic Planning and Deep Learning for Partially Observable Navigation. In MLPC Workshop, International Conference on Robotics and Automation, 2018.

This work extends the QMDP-net, and encodes all components of a larger robotic system in a single neural network: state estimation, planning, and control. We apply the idea to a challenging partially observable navigation task: a robot must navigate to a goal in a previously unseen 3-D environment without knowing its initial location, and instead relying on a 2-D floor map and visual observations from an onboard camera.

Particle Filter Network (PF-Net)

P. Karkus, D. Hsu, and W. S. Lee. Particle Filter Networks with Application to Visual Localization. In Conference on Robot Learning, CoRL, 2018. [code]

In this work, we encode the Particle Filter algorithm in a differentiable neural network. PF-net enables end-to-end model learning, which trains the model in the context of a specific algorithm, resulting in improved performance, compared with conventional model-learning methods. We apply PF-net to visual robot localization. The robot must localize in rich 3-D environments, using only a schematic 2-D floor map. PF-net learns effective models that generalize to new, unseen environments. It can also incorporate semantic labels on the floor map.

Read more on PF-nets combined with state representation learning for sequence prediction and RL here; and map respresentation learning here.








Differentiable Algorithm Network (DAN)

P. Karkus, X. Ma, D. Hsu, L. P. Kaelbling, W. S. Lee, and T. Lozano-Pérez. Differentiable Algorithm Networks for Composable Robot Learning. In Robotics: Science and Systems, RSS, 2019. Nominated for Best Student Paper and Best Systems Paper awards.

DANs compose modules of differentiable robot algorithms and associated models into a single neural network that is trained end-to-end from data. From a model-free policy learning perspective the algorithms in DAN act as structured prior. From a model-based RL perspective, instead of training models to match the underlying system dynamics, DAN trains models end-to-end to optimize the overall task objective by backpropagating gradients through the algorithms. The benefit of task-oriented learning is that models and algorithms can adapt and compensate for each others’ imperfections. We illustrate the DAN methodology using differentiable modules for visual perception, state filtering, planning, and local control in the context of a partially observably visual navigation task in 3-D environments.


How can a delivery robot navigate reliably to a destination in a new office building, with minimal prior information? To tackle this challenge, this paper introduces a two-level hierarchical method, which integrates model-free deep learning and model-based path planning. At the low level, a neural-network motion controller, called the intention-net, is trained end-to-end to provide robust local navigation. Intention-net maps images from a single monocular camera and given “intentions” directly to robot control. At the high level, a path planner uses a crude map, e.g., a 2-D floor plan, to compute a path from the robot’s current location to the goal. The planned path provides intentions to the intention-net. Preliminary experiments suggest that the learned motion controller is robust against perceptual uncertainty and by integrating with a path planner, it generalizes effectively to new environments and goals.


W. Gao, D. Hsu, W. Lee, S. Shen, and K. Subramanian. Intention-Net: Integrating planning and deep learning for goal-directed autonomous navigation. In S. Levine and V. V. and K. Goldberg, editors, Conference on Robot Learning, volume 78 of Proc. Machine Learning Research, pages 185–194. 2017.

We study the problem of visual navigation in new environments. Humans can easily navigate in an arbitrary environment with crude floor plan and perfect local collision-free navigation. We try to mimic the human navigation in a principled way by integrating high-level planning in the crude global map, i.e. the floor plan and low-level neural-network motion controller. We design the “intention” as the interface to communicate with high-level path planning and local-level neural-network motion controller.

Two intentions:

We mainly design two kinds of intention parsers from high-level path planning.

Discretized local move(DLM) intention: We assume that in most cases discretized navigation instructions would be enough for navigation. For example, turning left at the next junction would be enough for human drivers to drive in the real world. We mimic the same procedure by introducing four discretized intentions.

Local path and environment(LPE) intention: The DLM intention is too ad-hoc and relies on pre-defined parameters. To alleviate the issue, we design a map-based intention encoded all the navigation information from the high-level planner.


Our goal is to develop a formal computational framework of trust, supported by experimental evidence and predictive models of human behaviors, to enable automated decision-making for fluid collaboration between humans and robots.


M. Chen, S. Nikolaidis, H. Soh, D. Hsu, and S. Srinivasa. Planning with trust for human-robot collaborationIn Proc. ACM/IEEE Int. Conf. on Human-Robot Interaction, 2018.

Trust is essential for human-robot collaboration and user adoption of autonomous systems, such as robot assistants. This paper introduces a computational model which integrates trust into robot decision-making. Specifically, we learn from data a partially observable Markov decision process (POMDP) with human trust as a latent variable. The trust-POMDP model provides a principled approach for the robot to (i) infer the trust of a human teammate through interaction, (ii) reason about the effect of its own actions on human behaviors, and (iii) choose actions that maximize team performance over the long term. We validated the model through
human subject experiments on a table-clearing task in simulation (201 participants) and with a real robot (20 participants). The results show that the trust-POMDP improves human-robot team performance in this task. They further suggest that maximizing trust in itself may not improve team performance.

The human language provides a powerful natural interface for humans to communicate with robots. We aim to develop a robot system that follows natural language instructions to interact with the physical world.

Interactive Visual Grounding

M. Shridhar and D. Hsu. Interactive visual grounding of referring expressions for human-robot interaction. In Proc. Robotics: Science & Systems, 2018.

INGRESS is a robot system that follows human natural language instructions to pick and place everyday objects. The core issue here is the grounding of referring expressions: infer objects and their relationships from input images and language expressions. INGRESS allows for unconstrained object categories and unconstrained language expressions. Further, it asks questions to disambiguate referring expressions interactively. To achieve these, we take the approach of grounding by generation and propose a two-stage neural-network model for grounding. The first stage uses a neural network to generate visual descriptions of objects, compares them with the input language expression, and identifies a set of candidate objects. The second stage uses another neural network to examine all pairwise relations between the candidates and infers the most likely referred object. The same neural networks are used for both grounding and question generation for disambiguation.

Compared with handheld cameras widely used today, a camera mounted on a flying drone affords the user much greater freedom in finding the point of view for a perfect photo shot.  In the future, many people may take along compact flying cameras and use their touchscreen mobile devices as viewfinders to take photos. Our goal is to explore the user interaction design and system implementation issues  for a flying camera, which  leverages the autonomous flying capability of a drone-mounted camera for a great photo-taking experience.


Z. Lan, M. Shridhar, D. Hsu, and S. Zhao. XPose: Reinventing user interaction with flying cameras. In Proc. Robotics: Science & Systems, 2017.

XPose is a new touch-based interactive system for photo taking, designed to take advantage of the autonomous flying capability of a drone-mounted camera. It enables the user to interact with photos directly and focus on taking photos instead of piloting the drone. XPose introduces a two-stage eXplore-and-comPose approach to photo taking in static scenes. In the first stage, the user explores the “photo space” through predefined interaction modes: Orbit, Pano, and Zigzag. Under each mode, the camera visits many points of view (POVs) and takes exploratory photos through autonomous drone flying. In the second stage, the user restores a selected POV with the help of a gallery preview and uses direct manipulation gestures to refine the POV and compose a final photo.

Perspective-2-Point (P2P)

Z. Lan, D. Hsu, and G. Lee. Solving the perspective-2-point problem for flying-camera photo composition. In Proc. IEEE Conf. on Computer Vision & Pattern Recognition, 2018.

This work focuses on a common situation in photo-taking, i.e., the underlying viewpoint search problem for composing a photo with two objects of interest. We model it as a Perspective-2-Point (P2P) problem, which is under-constrained to determine the six degrees-of-freedom camera pose uniquely. By incorporating the user’s composition requirements and minimizing the camera’s flying distance, we form a constrained nonlinear optimization problem and solve it in closed form.

The overall goal of this project is to integrate planning and learning to enable a robot vehicle to drive autonomously among many pedestrians and vehicles.


Y.F. Luo, P.P. Cai, A. Bera, D. Hsu, W.S. Lee, and D. Manocha. PORCA: Modeling and planning for autonomous driving among many pedestrians. In IEEE Robotics & Automation Letters. 2018.

This projects investigates a planning system for autonomous driving among many pedestrians. A key ingredient of our approach is a motion prediction model for pedestrians and vehicles. It accounts for both a pedestrian’s global navigation intention and local interactions with the vehicle and other pedestrians. Unfortunately, the autonomous vehicle does not know the pedestrians’ intentions a priori and requires a planning algorithm that hedges against the uncertainty in pedestrian intentions. Our planning system combines a POMDP algorithm with the pedestrian motion model and runs in real time. Experiments show that it enables a robot scooter to drive safely, efficiently, and smoothly in a crowd with a density of nearly one person per square meter.


Importance Sampling for Online Planning Under Uncertainty


The partially observable Markov decision process (POMDP) provides a principled general framework for robot planning under uncertainty. Leveraging the idea of Monte Carlo sampling, recent POMDP planning algorithms have scaled up to various challenging robotic tasks, including, e.g., real-time online planning for autonomous vehicles. To further improve online planning performance, this paper presents IS-DESPOT, which introduces importance sampling to DESPOT, a state-of-the-art sampling-based POMDP algorithm for planning under uncertainty. Importance sampling improves DESPOT’s performance when there are critical, but rare events, which are difficult to sample. We prove that IS-DESPOT retains the theoretical guarantee of DESPOT. We demonstrate empirically that importance sampling significantly improves the performance of online POMDP planning for suitable tasks. We also present a general method for learning the importance sampling distribution.



  • Y.F. Luo, H.Y. Bai, D. Hsu, and W.S. Lee. Importance sampling for online planning under uncertainty. In Algorithmic Foundations of Robotics XII – Proc. Int. Workshop on the Algorithmic Foundations of Robotics (WAFR). 2016.
    BibTeX PDF


The motion of physical objects is subject to kinematic and dynamic constraints. For example, a car cannot move sidewise; a bouncing ball must obey the laws of physics. These constraints compound the difficulty of motion planning. We have extended our sampling techniques for path planning to handle objects whose motion is described by a control system, which is a set of differential equations that captures diverse types of kinematic and dynamic constraints. In particular, our implemented algorithm is fast enough to deal with moving obstacles in “real-time”.

Experiments in Simulated Environments

We tested our algorithm on two different systems. The first one consists of two wheeled mobile robots that maintains a direct line of sight as well as a minimum and a maximum distance between them (see the movie below). The second system is a cylindrical robot propelled by eight air-thrusters. It operates on a “frictionless” granite table with moving obstacles and is subject to maximum acceleration bounds due to the limited actuating forces of air-thrusters. Despite the apparent differences in these two systems, our algorithm deals with them in a unified framework.


Quicktime movie showing R2-D2 robots
executing motion computed by the motion planner (2.5 MB).
Download the Quicktime viewer

Experiments on Real Robots

We have also experiemented with a hardware implementation of our algorithm on the second system mentioned above at the Stanford University’s Aerospace Robotics Laboratory. The robot floats “frictionlessly” on the granite table using air-bearing. The roughly circular objects on the granite table are moving obstacles. An overhead vision system detects the motion of obstacles. In reponse, the robot computes a collision-free trajectory to the goal state on the fly.

The photos show the motion of the robot in an experiment, in which the robot attempts to move to the goal state at the front of the table.

Initial state Manuever to avoid the incoming obstacle

Moving towards the goal

Wait for the obstacle to pass Goal state


  • D. Hsu, R. Kindel, J.C. Latombe, and S. Rock. Randomized kinodynamic motion planning with moving obstaclesInt. J. Robotics Research, 21(3):233–255, 2002.

  • D. Hsu, R. Kindel, J.C. Latombe, and S. Rock. Control-based randomized motion planning for dynamic environments. In B.R. Donald and others, editors, Algorithmic and Computational Robotics: New Directions—Proc. Workshop on the Algorithmic Foundations of Robotics (WAFR), pp. 247–264, A. K. Peters, Wellesley, MA, 2000.

  • R. Kindel, D. Hsu, J.C. Latombe, and S. Rock. Kinodynamic motion planning amidst moving obstacles. In Proc. IEEE Int. Conf. on Robotics & Automation, pp. 537–543, 2000.



Computing distance between objects is an important problem in many applications, for example, virtual environment simulation, computer animation, and robot motion planning. Distance information can be used to detect collision, as collision has undesirable consequences in most systems. It can also be used to guide sampling in randomized motion planning algorithms. This project investigates efficient algorithms for collision detection and distance computation. The particular emphasis is on collision detection among moving objects.

H-Walk: Distance Computation for Moving Convex Bodies

The Hierarchical Walk, or H-Walk algorithm, maintains the distance between two moving convex bodies by exploiting both coherence of motion and hierarchical representation. For convex polygons, we have proven that H-Walk improves on the classic Lin-Canny and the Dobkin-Kirkpatrick algorithms. We have also implemented H-Walk for moving convex polyhedra in three dimensions. Experimental results indicate that unlike previous incremental distance computation algorithms, H-Walk scales well with respect to coherence.

Asteriod avoidance

Quicktime movie showing H-Walk in working (0.9 MB).
Download the Quicktime viewer

In our experiments, we used a pair of identical objects. One of them is fixed at the origin, and the other orbits around it and rotates about some randomly chosen axis. The plots below show the performance of H-Walk, as well as that of V-Clip, a very efficient implementation of the Lin-Canny algorithm. In the graphs below, the horizontal axis is some estimate of coherence. The vertical axis is the number of steps that it takes each algorithm to find the closest pair of features. H-Walk can also automatically adjust a parameter in the algorithm in order to respond to changes in the level of coherence and thus maintain the best performance. In the experiments, we kept the parameter fixed in order to see its effect on the algorithm. In the graphs, “h-walk n” stands for H-Walk with the parameter set to n. See the paper for details and more experimental results.


  • L.J. Guibas, D. Hsu, and L. Zhang. A hierarchical method for real-time distance computation among moving convex bodiesComputatonal Geometry: Theory and Applications, 15(1-3):51–68, 2000.
  • L.J. Guibas, D. Hsu, and L. Zhang. H-Walk: hierarchical distance computation for moving convex bodies. In Proc. ACM Symp. on Computational Geometry, pp. 265–273, 1999.



We thank Brian Mirtich for providing us an implementation of the V-Clip alogorithm, which greatly eased our experiments.

PXR Pocket
The binding pocket of hPXR.


The human pregnane X receptor (hPXR) is a nuclear receptor that binds to various ligands, regulating the breakdown of drugs in the human body. To study drug-drug interactions, we are investigating a method that predicts potential ligand binding conformations in the binding pocket of hPXR.

Binding Conformations from the Geometry of Hydrogen Bonds

The binding pocket of hPXR contains eight polar residues which are key in binding ligands. These polar atoms can potentially share hydrogens with the polar atoms of ligands, forming hydrogen bonds.

A stick for the preferred bond direction from a donor polar atom. Two sticks for the preferred bond directions from an sp2 hybridized acceptor polar atom. A wedge for the preferred bond directions from an sp3 hybridized acceptor polar atom.

Hydrogen bonds have a preferred geometry that depends on the chemical neighborhood of the polar atoms. We model the observed preferred geometry for hydrogen bonding by sticks and wedges. A hydrogen bond is made if two polar atoms are close, and a donor stick is aligned with an acceptor stick or wedge. These geometric constraints are formulated in an optimization problem, whose solutions are ligand conformations that establish at least two hydrogen bonds between a ligand and the binding pocket. We then check the candidate conformations for steric hindrance using a hierarchical collision-detection algorithm and retain the valid conformations in the binding pocket.


  • R.-P. Berretty, D. Hsu, L. Kettner, A. Mascarenhas, M.R. Redinbo, and J. Snoeyink. Ligand binding to the pregnane X receptor by geometric matching of hydrogen bonding. In L. Florea and others, editors, Currents in Computational Molecular Biology, pp. 22–23, 2002. The book contains extended poster abstr acts from RECOMB 2002.



This work has been partially funded by an NSF-ITR grant.


Many interesting properties of molecular motion are best characterized statistically by considering an ensemble of motion pathways rather than an individual one. Classic simulation techniques, such as the Monte Carlo method and molecular dynamics, generate individual pathways one at a time and are easily “trapped” in the local minima of the energy landscape. They are computationally inefficient if applied in a brute-force fashion to deal with many pathways. We introduce Stochastic Roadmap Simulation (SRS), a randomized technique for sampling molecular motion and exploring the kinetics of such motion by examining multiple pathways simultaneously.

Stochastic Roadmap Simulation

SRS compactly encode many motion pathways in a directed graph, called a probabilistic conformational roadmap. A node in the graph is a point sampled at random from the conformation space of a molecule. Every path in the roadmap is a potential motion pathway for the molecule. A roadmap thus contains many pathways, with associated probabilities indicating the likelihood that a molecule may follow these pathways. Using tools from the Markov chain theory, we can efficiently obtain kinetic information on the motion of molecules from the roadmap.

We tested SRS by first using it to compute the probability of folding, an important order parameter that measures the “kinetic distance” of a conformation to the native state of a protein. Our computational studies demonstrate that, compared with the Monte Carlo method, SRS obtains more accurate results and reduces the running time by several orders of magnitudes. Problems that required 100 days of computation with the Monte Carlo method were solved in an hour with SRS. Furthermore we proved that, in the limit, SRS converges to the same distribution as the Monte Carlo method.

We then used the probability of folding to estimate the transition state ensemble (TSE) and predicted the rates and the phi-values for protein folding. Of the 16 proteins we studied, comparison with experimental data shows that our method estimates the TSE much more accurately than an existing method based on dynamic programming. This improvement leads to better folding-rate predictions. We also computed the mean first passage time of the unfolded states and show that the computed values correlate with experimentally determined folding rates. The results on phi-value predictions are mixed, possibly due to the simple energy model used in the tests. The comparison with experimental data further validates the SRS method and indicates its potential as a general tool for studying protein folding kinetics.

Predicted folding rates versus the experimentally measured folding rates. The 16 proteins ranged from 56 to 128 amino acids in length.

3CHY sequence of secondary structure formation. Warmer colors indicate higher degree of native contacts attained. The colored bar on the left indicates secondary structures, red for helices and green for strands.


  • T.H. Chiang, M.S. Apaydin, D.L. Brutlag, D. Hsu and J.C. Latombe. Predicting Experimental Quantities in Protein Folding Kinetics using Stochastic Roadmap Simulation. In Proc. ACM Int. Conf. on Computational Biology (RECOMB), pp. 410–424, 2006.
  • M.S. Apaydin, D.L. Brutlag, C. Guestrin, D. Hsu, J.C. Latombe, and C. Varma. Stochastic roadmap simulation: an efficient representation and algorithm for analyzing molecular motionJ. Computational Biology, 10(3-4):247–281, 2003.
  • M.S. Apaydin, D.L. Brutlag, C. Guestrin, D. Hsu, and J.C. Latombe. Stochastic roadmap simulation: An efficient representation and algorithm for analyzing molecular motion. In Proc. ACM Int. Conf. on Computational Biology (RECOMB), pp. 12–21, 2002.



Protein conformational changes play a critical role in vital biological functions. Due to noise in data, deteremining salient conformational changes accurately and efficiently is a challenging problem. We have developed an efficient algorithm for analyzing conformational changes of a protein, given its structures in two different conformations. A key element of the algorithm is a statistical test that determines the similarity of two protein structures in the presence of noise. Using data from the Protein Data Bank and the Macromolecular Movements Database, we tested the algorithm on proteins that exhibit a range of different conformational changes. The results show that our algorithm can reliably detect salient conformational changes, including well-known examples such as hinge and shear.

Protein Conformation Flexibility Analysis

pFlexAna analyzes two protein structures in differing conformations and accurately determines residues involved in conformational change. It’s output is a per residue flexibility measure, which also indicates the degree of flexibility in each residue of the protein. The lower this metric, the more flexible the residue.

A salient feature of our method is a principled analysis of statistical noise in structural data, which allows us to reliably distinguish noise from true conformational change. As a result, pFlexAna produces far more accurate results than other methods used to detect conformational changes, as shown in the diagram below. The structure used for this example is the N-lobe of Lactoferrin, which has been shown to undergo hinged conformational changes around residues 90 and 250.

B-factors Torsion angles Sliding RMSD


Using PDB data, we tested our algorithm on proteins exhibiting a wide range of conformational changes – including TBSV Coat Protein, Lactoferrin, Lactate Dehydrogenase, HIV-1 protease, Adenosylcobinamide Kinase and Aspartate transcarbamoylase. Results for these proteins closely match those reported elsewhere in the literature and are shown below.


The software pFlexAna is an implementation of the algorithm. The source code is availabe for download, and the web server provides direct access to the software.
source code •  documentation • web server


  • A. Nigham and D. Hsu. Protein Conformational Flexibility Analysis with Noisy Data. In Proc. ACM Int. Conf. on Computational Biology (RECOMB), 2007.



Proper placement of the base of a robot manipulator is an important issue in many robotics applications. For instance, in manufacturing, the base location of a manipulator has a significant impact on the cycle time of tasks such as spot welding and inspection. An automated means to determine the best placement can both increase the throughput of workcells and reduce set-up time. This work investigates methods that employ randomized motion planning techniques to search for the best placement efficiently.

An Iterative Robot Placement Algorithm

Our algorithm computes simultaneously a base location and a corresponding collision-free path that are optimized with respect to the execution time of tasks. The algorithm has been tested on both synthetic examples and real-life CAD data from the automotive industry.




GM cell

Four test scenes (click to enlarge the images).
Porsche Movie
Quicktime movie (1.0 MB).
Download the Quicktime viewer

There is a QuickTime movie for one of the computed examples. The movie clip first shows a PUMA arm at some arbitrary initial base location and executing a path computed by a randomized motion planner, and then shows the manipulator performing the same task after path and base location optimization.


  • D. Hsu, J.C. Latombe, and S. Sorkin. Placing a robot manipulator amid obstacles for optimized execution. In Proc. IEEE Int. Symp. on Assembly & Task Planning, pp. 280–285, 1999.



This research has been supported by a grant from SIMA (Stanford Integrated Manufacturing Association), ARO MURI grant DAAH04-96-1-007, a Microsoft Graduate Fellowship, General Motors Corp., and Deneb Robotics Inc.


Motion planning is concerned with computing collision-free motion for moving objects (e.g., drug molecules inside the binding cavity of a protein, robot manipulators on an assembly line, or articulated models of human arms) in an environment populated with obstacles. The key difficulty of motion planning is its exponential dependence on the number of degrees of freedom of a moving object. The focus of this work is to use random sampling as a fundamental technique for attacking this issue.

Path Planning

In its simplest form, motion planning is a purely geometric problem: given a description of an object and a static environment, the goal is to find a collision-free path to move the object from an initial to a goal configuration. Simple as it may appear, a fast path-planning algorithm is directly useful for applications such as computer animation and virtual prototyping.

The algorithm that we have developed first samples in the neighborhoods of the initial and the goal configuration and then iteratively in the neighborhoods of the newly-sampled configurations, until a path is found. Extensive experimental results indicate that our algorithm is simple to implement and very efficient in practice. The images below show the results on some of the experiments that we have performed.


Quicktime movie (2.3 MB).
Download the Quicktime viewer

Expansive Spaces

The analysis of randomized motion planning algorithm is an interesting and difficult question. If there exists a path, does the algorithm always find it? The intuitive explanation for the observed success of randomized motion planning algorithms is that a small number of sampled points can effectively capture the connectivity of the environment. We have introduced the notion of expansiveness to characterize the complexity of the environment. We have proven that in an expansive space, our algorithm quickly finds a collision-free path with high probability, if one exists.


  • J. Basch, L.J. Guibas, D. Hsu, and A.T. Nguyen. Disconnection proofs for motion planning. In Proc. IEEE Int. Conf. on Robotics & Automation, pp. 1765–1772, 2001.
  • D. Hsu. Randomized Single-query Motion Planning in Expansive Spaces. Ph.D. Thesis, Dept. of Computer Science, Stanford University, Stanford, CA, 2000.
  • D. Hsu, J.C. Latombe, R. Motwani, and L.E. Kavraki. Capturing the connectivity of high-dimensional geometric spaces by parallelizable random sampling techniques. In P.M. Pardalos and S. Rajasekaran, editors, Advances in randomized parallel computing, pp. 159–182, Kluwer Academic Publishers, Boston, MA, 1999.
  • D. Hsu, J.C. Latombe, and R. Motwani. Path planning in expansive configuration spacesInt. J. Computational Geometry & Applications, 9(4-5):495–512, 1999.
  • D. Hsu, L.E. Kavraki, J.C. Latombe, R. Motwani, and S. Sorkin. On finding narrow passages with probabilistic roadmap planners. In P.K. Agarwal and others, editors, Robotics: The Algorithmic Perspective: 1998 Workshop on the Algorithmic Foundations of Robotics, pp. 141–154, A. K. Peters, Wellesley, MA, 1998.



We thank Nancy Amato at Texas A&M University and Boris Yamrom at General Electric Corporation for providing us the data used in some of the experiments shown above.


Several advanced sampling strategies have been proposed in recent years to address the narrow passage problem for probabilistic roadmap (PRM) planning. These sampling strategies all have unique strengths, but none of them solves the problem completely. We investigate general and systematic approaches for adaptively combining multiple sampling strategies so that their individual strengths are preserved. Our preliminary results show that although the performance of individual sampling strategies varies across different environments, the adaptive hybrid sampling strategies that we have constructed perform consistently well. We can also show that, under reasonable assumptions, the adaptive strategies are provably competitive against all individual strategies used.


  • D. Hsu, G. Sánchez-Ante, and Z. Sun. Hybrid PRM sampling with a cost-sensitive adaptive strategy. In Proc. IEEE Int. Conf. on Robotics & Automation, 2005.



Our motivation is to build autonomous robots that can follow people and recognize their activities. Such capabilities are important in applications such as home care for the elderly, intelligent environments, and iteractive media. In particular, the goal of this project is to develop reliable and efficient motion strategies for an autonomous robot to follow a target and keep it within sensing range. In this research, we investigate the use of greedy strategies using local information to acheive the goal.

A typical target following scenario in an everyday environment, e.g., the school cafeteria.

People walking around create a highly cluttered and dynamic environment in which the robot has to follow the target.Quicktime video (3.4 MB)

Greedy Strategies

Greedy tracking is useful, when very little is known about the environment and the target. Without an a-priori map of the environment, the tracker must acquire local information on the target and the environment using its sensors. In this work, we assume sensors based on a line-of-sight visibility model, e.g. laser range finders, cameras, etc. In particular, we use a SICK LMS200 laser range finder mounted on a robot with differential drive as the tracker in our experiments. The target is a person walking around a crowded environment. The target’s intentions are also unknown in advance, in particular we focus on non-adversarial targets.

Tracker : A SICK Lms-200 laser mounted on a Pioneer-3DX base

The robot’s local information.

Tracking Strategy

The objective of tracking is to keep the target within the tracker’s visibility region at all times. The tracker’s visibility is occluded by the presence of environmental obstacles and the sensing range of the tracker’s visibility sensor. The target can escape the tracker’s view through the gaps in the visibility region, also known as gap edges.

A line-of-sight visibility region has gaps due occlusions and the sensor limits, Range and Field-of-View (FoV). The danger zone around the gap-edge is the region where the target has higher relative vantage than the tracker. The robot balances the short term goal of swinging and long term goal of moving towards the occlusion edge while following the target along a corner. Quicktime video (2.4 MB)

To select actions effectively, the robot must balance between the short-term goal of preventing the immediate loss of the target and the long-term goal of keeping it visible for the maximum duration possible. The tracker does so by intelligently maneuvering the gap-edges away from the target. The ability of the tracker’s maneuvering power, and hence its ability to track the target, depends on its relative position and velocity w.r.t the target towards the gap-edge. This generates a notion of relative-vantage w.r.t the target. We define a region where the target’s ability to escape is greater than the tracker’s ability to eliminate the gap-edge as the danger zone. And so, the target has a higher vantage inside the dangerzone and the tracker outside. Taking this vantage as the objective function, the tracker can optimize its velocity to improve its relative vantage w.r.t the target.

Local Prediction

We predict the target’s motion locally in order to choose actions to give higher weightage to more probable escape points. Although many prediction methods are possible, we use a simple one in which the target’s velocity in the next step is estimated by storing a limited history and extrapolating, and the uncertainty of the target’s velocity estimation is modeled as a probability distribution.

Without prediction the attention of the tracker is spread over all the escape edges. A simple prediction lets the tracker focus on the most probable escape point. The simple probability measure is applied and it generates reliable relative probability of escape of the target through the escape edges Quicktime video (1.6 MB)

2-D Target Tracking

Experiments show that such a tracking strategy performs better than those that just servo onto the target or those that just take the shortest distance to escape of the target into account. Below, the red robot is the target being followed by the green tracker. We see that the vantage tracker is able to track the longest.

Visual Servo Shortest distance based tracker Vantage tracker
The visual servo algorithm does not take into account the changing environment and allows the target to be occluded by the box. Quicktime video (1.7 MB) The vantage tracker explicitly models the visibility relationship of the target, itself and the occlusion due to the box at each instant and successfully follows the target. Quicktime video (3.0 MB)

Stealth Tracking

An advantage of our fomulation for the tracking problem in terms of objective function is the ease with which additional considerations can be accomodated. For example, stealth tracking requires the robot to follow a target without letting itself exposed. In this aspect stealth is actually the inverse of the tracking problem. Our algorithm applies a stealth constraint to the tracking objective function. The stealth constraint requires the robot to stay close to the boundaries of the target’s visibility while keeping the target away from the robot’s visibility boundaries. Due to local sensing, the target’s visibility can only be computed within the robot’s own visibility. We impose a lookout region (L) around the target’s visiblity boundaries in which to constrain the optimization problem.

Lookout regions due to the target’s visibility. Constrained optimization causes the robot to move in a stealthy fashion. Quicktime video (1.7 MB)

Hardware Limitations

The limitations of the hardware also affects the tracking problem. E.g, to cater for the limited field of view (FoV) of a sensing mechanism, the robot must in addition make sure that the target is away from the FoV limits. The tracking strategy considers these limits as pseudo escape edges and tries manipulate these away from the target in a similar fashion to the occlusion edges. To validate our algorithm, we bring the robot to follow a person in the school cafeteria.

The target (red pioneer) is followed in an indoor environment by the robot (green pioneer). The right window shows the visibility and decision making process for the robot. Blue lines denote the escape edges and the green line shows the current decision velocity. Note that when the target marked by T, goes too near the edge, the robot swings to keep it in view. The most critical edge is marked red. Quicktime video(7.4 MB) The canteen environment is inherently complex with the table and chair legs having a similar signature as the target person’s legs. People walking by create visual occlusions and mobility obstructions. Quicktime video (2.2 MB)

Robustness of the vantage tracker

Due to the fast online nature of the vantage tracker, the errors due to uncertainty are bounded, and it is able to recover from temporary occlusions. Such a tracker is also robust to uneven terrains.

Temporary occlusions: A challenging aspect of following the target in a crowd is when someone walks in between the robot and the target. In this video the robot is following the target in green t-shirt when if faces an temporary occlusion by a lady walking across (purple). These temporary occlusions in such a dynamic environment is handled well by the robot. Quicktime video (1.8 MB) Uneven terrain: The robot follows the person successfully in uneven terrains. Notice the high fluctuations in the sensor readings in the bottom right window. The fast recomputation and online nature of the robot helps it in maintaining the target as the robot takes a bumpy ride. Quicktime video (1.5 MB)

Tracking in 3D Environments

A related approach has been developed for tracking in 3-D. Although the basic principles are similar, the 3-D problem is much more difficult. Moreover, there are situations where pure local greedy strategies may not work well.

The consideration of relative vantage in 3-D leads us to vantage zones as shown. The robot tries to manipulate these vantage zones in 3-D leading to tracking motions.

Tracking scenario in 3-D. A single occlusion plane (in blue) being generated due to the obstacle (pink wall). The blue sphere denotes the robot position and the red cube the target. Vantage zone in 3-D. R is the robot, O the occlusion plane, Oe is the lateral edge and g, the occlusion edge. The grey prism represents the obstacle.

Tracking in an urban scenario. A tracking helicopter follows the target in an urban environment using a 3-d scanning laser data points. 3-D Visibility polygon is extracted from the data point cloud. The motion is then generated to optimize risk based on the vantage zones. Quicktime video (3.3 MB)


  • T. Bandyopadhyay, D. Hsu, and Ang Jr., M.H.. Motion strategies for people tracking in cluttered dynamic environments. In Proc. Int. Symp. on Experimental Robotics, 2008. To appear.
  • T. Bandyopadhyay, Ang Jr., M.H., and D. Hsu. Motion planning for 3-D target tracking among obstacles. In Proc. Int. Symp. on Robotics Research, 2007.
  • T. Bandyopadhyay, Y.P. Li, Ang Jr., M.H., and D. Hsu. A greedy strategy for tracking a locally predictable target among obstacles. In Proc. IEEE Int. Conf. on Robotics & Automation, pp. 2342–2347, 2006.
  • T. Bandyopadhyay, Y.P. Li, Ang Jr., M.H., and D. Hsu. Stealth tracking of an unpredictable target among obstacles. In M. Erdmann and others, editors, Algorithmic Foundations of Robotics VI—Proc. Workshop on the Algorithmic Foundations of Robotics (WAFR), pp. 43–58, Springer, 2004.



Our motivation is to build autonomous robots that can follow people and recognize their activities. Such capabilities are important in applications such as home care for the elderly, intelligent environments, and iteractive media. In particular, the goal of this project is to develop reliable and efficient motion strategies for an autonomous robot to follow a target and keep it within the sensor range, despite occlusion by obstacles. We are currently investigating two approaches, POMDPs and greedy strategies.

Experimental setup: a Sick laser mounted on a Pioneer DX3 indoor robot.

POMDP Trackers

Target tracking has two variants that are often studied independently with different approaches: target searching requires a robot to find a target initially not visible, and target following requires a robot to maintain visibility on a target initially visible. We use partially observable Markov decision process (POMDP) to build a single model that unifies target searching and target following. The resulting POMDP policy exhibits interesting tracking behaviors, such as anticipatory moves that exploit target dynamics, information-gathering moves that reduce target position uncertainty, and energy-conserving actions that allow the target to get out of sight, but do not compromise tracking performance.

Some preliminary results are shown in the videos below. The problem setting is motivated by homecare applications. Imagine that an elderly person moves around at home and has a call button to call a robot over for help. The call status stays on for some time and then goes off. If the robot arrives while the call status is on, it gets a reward; otherwise, it gets no reward. Clearly the robot should stay close the person in order to improve the chance of receiving rewards, but at the same time, the robot needs to minimize movement in order to reduce power consumption. So the naive strategy of following right behind the person does not work well. What is interesting about these examples is that the robot manage to “track” the target while the target is outside the robot’s sensor visibility region most of the time.

Quicktime video (1.1 MB)

Quicktime video (1.0 MB)

Quicktime video (2.3 MB)


Light blue areas indicate obstacles. The green area around the robot indicate sensor’ visibility region. The various shades of gray show the robot’s belief of the target position. Lighter color indicates higher probability.


  • D. Hsu, W.S. Lee, and N. Rong. A point-based POMDP planner for target tracking. In Proc. IEEE Int. Conf. on Robotics & Automation, pp. 2644–2650, 2008.
    BibTeX PDF

Greedy Followers

POMDP trackers integrate global information on the target behavior and the environment for optimal decision making. When little is known about the target behavior or the environment, a local greedy strategy is more effective. Key to our algorithm is the definition of a risk function, which tries to capture the target’s ability in escaping from the robot sensors’ visibility region in both short and long terms. To select actions effectively, the robot must balance between the short-term goal of preventing the immediate loss of the target and the long-term goal of keeping it visible for the maximum duration possible. Interestingly, a good comprise can be achieved, using only local information available to the robot’s sensors. By analyzing the local geometry, our algorithm computes a global risk function as a weighted sum of components, each associated with a single visibility constraint. It then chooses an action to minimize the risk locally in a greedy fashion.

As the algorithm uses only local geometric information available to the robot’s visual sensors, it does not require a global map and thus bypasses the difficulty of localization with respect to a global map. Furthermore, uncertainty in sensing and motion control does not accumulate. This improves the reliability of tracking.

This approach can be developed in both 2-D and 3-D. The 3-D case is, however, much more challenging technically, as both the robot and the target gain one more degree of freedom to maneuver and the visibility relationships in 3-D are more complex than those in 2-D. More details can be found here.

Quicktime video (0.6 MB) Quicktime video (1.2 MB)
Quicktime video (0.5 MB) Quicktime video (0.5 MB)
The videos show the same motion of a robot helicopter from two different viewing angles.


  • T. Bandyopadhyay, D. Hsu, and Ang Jr., M.H.. Motion strategies for people tracking in cluttered dynamic environments. In Proc. Int. Symp. on Experimental Robotics, 2008. To appear.
  • T. Bandyopadhyay, Ang Jr., M.H., and D. Hsu. Motion planning for 3-D target tracking among obstacles. In Proc. Int. Symp. on Robotics Research, 2007.
  • T. Bandyopadhyay, Y.P. Li, Ang Jr., M.H., and D. Hsu. A greedy strategy for tracking a locally predictable target among obstacles. In Proc. IEEE Int. Conf. on Robotics & Automation, pp. 2342–2347, 2006.
  • T. Bandyopadhyay, Y.P. Li, Ang Jr., M.H., and D. Hsu. Stealth tracking of an unpredictable target among obstacles. In M. Erdmann and others, editors, Algorithmic Foundations of Robotics VI—Proc. Workshop on the Algorithmic Foundations of Robotics (WAFR), pp. 43–58, Springer, 2004.

POMDPs provide a principled mathematical framework for planning and decision making under uncertainty. However, POMDPs are often avoided in practice, because solving POMDPs exactly is computationally intractable. Not long ago, the best algorithms could spend hours computing exact solutions to POMDPs with only a dozen states. In recent years, point-based algorithms have made impressive progress in computing approximately optimal solutions for large POMDPs: POMDPs with hundreds of states have been solved in a matter of seconds. This project consists of two main thrusts:

  • understand why the point-based algorithms work well, whether there are sub-classes of POMDPs that are computationally easier, and what complexity measures are suitable for capturing the difficulty of POMDP planning for point-based algorithms, and
  • develop efficient point-based POMDP algorithms by exploting the insights from the theoretical analysis.

  • Dec 15, 2009
APPL release 0.91 is available for download here.
  • Feb 15, 2009
APPL release 0.3 is available for download here.
  • Jul 17, 2008
APPL release 0.2 is available for download here.
  • Apr 15, 2008
APPL, our software package for solving POMDPs, is available for download here.

The Covering Number and The Complexity of POMDP Planning

Intuitively, the computational intractability is due to the “curse of dimensionality”: the belief space B used in solving a POMDP typically has dimensionality equal to |S|, the number of states in the POMDP, and therefore the size of B grows exponentially with |S|. As a result, the number of states is often used as an important complexity measure for POMDP planning. However, if the number of states was an accurate measure of the “true” complexity of a POMDP, it would seem surprising that point-based algorithms can obtain (approximate) solutions in seconds in belief spaces of hundreds of dimensions.

We propose to use the covering number as an alternative measure of the complexity of POMDP planning. As expected, not all belief spaces have the same complexity even if they have the same dimenionality. Consider the example below. Although B and B′ are both two-dimensional, it is intuitively clear that the size of B′, which consists only three points, is much smaller than of that B.

The covering number captures several interesting structural properties that drastically reduce the size of the belief space, such as sparsity and factorization. Using the idea of the covering number, we have identified several conditions under which approximate POMDP solutions can be computed efficiently.


One key idea of point-based POMDP algorithms is to sample a set of points from a belief space Band use it as an approximate representation of B, instead of representing B exactly. Some early POMDP algorithms sample the entire belief space B, using a uniform sampling distribution, such as a grid. However, it is difficult to sample a representative set of points from B, due to its large size. More recent point-based algorithms sample only R, the subset of belief points reachable from a given initial point b0 in B, under arbitrary sequences of actions. It is generally believed that R is much smaller than B. Indeed, focusing on R allows point-based algorithms to scale up to larger problems. To push further in this direction, we would like to sample near R*, the subset of belief points reachable from b0 under optimal sequences of actions, as  R* is usually much smaller than R. Of course, the optimal sequences of actions constitute exactly the POMDP solution, which is unknown in advance. The main idea of our algorithm is to compute successive approximations of R* through sampling and converge to it iteratively. Hence the name SARSOP, which stands for Successive Approximations of the Reachable Space under Optimal Policies.

Navigation. Grasping. Target tracking.
Integrated exploration

In simulation, we successfully applied SARSOP to a set of common robotic tasks, including instances of coastal navigation, grasping, mobile robot exploration, and target tracking, all modeled as POMDPs with a large number of states. In most of the instances studied, SARSOP substantially outperformed one of the fastest existing point-based algorithms. The figure below shows the performance of SARSOP on the well-known Tag problem.


  • D. Hsu. Towards Large-Scale POMDP Planning for Robotic Tasks. Invited talk at International Conference on Machine Learning (ICML), Workshop on Planning and Acting with Uncertain Models, 2011. Research supported in part by MDA GAMBIT grant R-252-000-398-490, MoE AcRF grant 2010-T2-2-071, and NUS AcRF grant R-252-000-424-112.
  • H. Kurniawati, Y. Du, D. Hsu, and W.S. Lee. Motion planning under uncertainty for robotic tasks with long time horizonsInt. J. Robotics Research, 30(3):308-323, 2011.
  • S.C.W. Ong, S.W. Png, D. Hsu, and W.S. Lee. Planning under uncertainty for robotic tasks with mixed observabilityInt. J. Robotics Research, 29(8):1053–1068, 2010.
  • H. Kurniawati, D. Hsu, and W.S. Lee. SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. In Proc. Robotics: Science and Systems, 2008.
  • D. Hsu, W.S. Lee, and N. Rong. A point-based POMDP planner for target tracking. In Proc. IEEE Int. Conf. on Robotics & Automation, pp. 2644–2650, 2008.
  • D. Hsu, W.S. Lee, and N. Rong. What makes some POMDP problems easy to approximate? In Advances in Neural Information Processing Systems (NIPS), 2007.


“Movement is the essence of animation.”

– The Technique of Film Animation, J. Halas and R. Manvell

Creating motion for human-like characters is an interesting and important problem. Despite the versatility of human movement, two types of behaviors are essential: locomotion (moving around in an environment) and manipulation (using hands and arms to manipulate objects).


For locomotion, obstacles in the environment can often be tightly bounded by prisms perpendicular to the floor. We can exploit this fact and project the three-dimensional geometry of the environment and the character to a two-dimensional plane. A trajectory in the plane is then computed and tracked by a controller. We have used this idea in the Human Figure Animation Project to control the behavior of a stick figure whose motion is generated by interpolating motion capture data. The combination of motion planning and motion capture techniques renders motion that is both versatile and realistic for character animation.

Manipulating Objects

Manipulating objects

Quicktime movie (2.7 MB)

Manipulating objects with hands and arms is a difficult task. Due to the complex geometric interaction of the character with the environment, keyframe interpolation, which requires lots of user intervention, is too tedious. More advanced methods such as dynamic simulation and space-time constraints are not applicable. In contrast, motion planners are a perfect candidate for this task. It creates collision-free motion with little user intervention and easily handles task-level commands such as “pick up the apple”.

Geometric model of the character made available by Seamless Solutions, Inc.

The emergence of fast randomized motion planners has allowed the motion to be computed at interactive rates. The movie and the images above demonstrate a sequence of movements generated by our motion planner. We specify only three configurations for the arm: the initial configiration, the grasp configuration, and the final goal. The motion planner then automatically generates the collision-free motion without furthur human intervention.

Unfortunately animation generated solely by motion planners often looks “stiff” as the movie clip shows, because motion planners typically operate on simplified models of characters that capture the the functional, but not the aesthetic aspect of movement. A fruitful approach, we believe, is to build a realistic motion model from captured data through machine learning and then synthesize novel motion through motion planning. We are working on techniques to combine model-driven motion planning and data-driven motion capture to create versatile and realistic animation.


This research has been supported partically by ARO MURI grant DAAH04-96-1-007 and a Microsoft Graduate Fellowship. We thank Seamless Solutions, Inc. for making available the character model used in the animation.