Author Archives: Siwei Chen

This research examines particle-based object manipulation. We start with Particle-based Object Manipulation (PROMPT), designed for the manipulation of unknown rigid objects. PROMPT leverages a particle-based object representation to bridge visual perception and robot control. We expand this particle representation to deformable objects and introduce DaXBench, a differentiable simulation framework to benchmark deformable object manipulation (DOM) techniques. DaxBench enables deeper exploration and systematic comparison of DOM techniques. Finally, we present Differentiable Particles (DiPac), a new DOM algorithm, using the particle presentation. The differentiable particle dynamics aids in the efficient estimation of dynamics parameters and action optimization. Together, these new tools address the challenge of robot manipulation of unknown deformable objects, paving the way for new progress in the field.

Prompt

S. Chen, X. Ma, Y. Lu, D. Hsu. Ab initio particle-based object manipulation. Proceedings of Robotics: Science and Systems, RSS 2021.
PDF | Code

We present Particle-based Object Manipulation (Prompt), a new approach to robot manipulation of novel objects ab initio, without prior object models or pre-training on a large object data set. The key element of Prompt is a particle-based object representation, in which each particle represents a point in the object, the local geometric, physical, and other features of the point, and also its relation with other particles. Like the model-based analytic approaches to manipulation, the particle representation enables the robot to reason about the object’s geometry and dynamics in order to choose suitable manipulation actions. Like the data-driven approaches, the particle representation is learned online in real-time from visual sensor input, specifically, multi-view RGB images. The particle representation thus connects visual perception with robot control. Prompt combines the benefits of both model-based reasoning and data-driven learning.

DaxBench

S. Chen*, Y. Xu*, C. Yu*, L. Li, X. Ma, Z. Xu, D. Hsu. DaxBench: Benchmarking Deformable Object Manipulation with Differentiable Physics. Proceedings of The Eleventh International Conference on Learning Representations, ICLR 2023 (Oral). (*Co-first author)

PDF | Code | Website

We extend the particle representation to deformable object manipulation with differentiable dynamics. Deformable object manipulation (DOM) is a long-standing challenge in robotics and has attracted significant interest recently. This paper presents DaXBench, a differentiable simulation framework for DOM. While existing work often focuses on a specific type of deformable objects, DaXBench supports fluid, rope, cloth …; it provides a general-purpose benchmark to evaluate widely different DOM methods, including planning, imitation learning, and reinforcement learning. DaXBench hence serves as a working bench to facilitate research on deformable object manipulation using particles.

DiPac

S. Chen, Y. Xu, C. Yu, L. Li, X. Ma, D. Hsu. Differentiable Particles for Deformable Object Manipulation

PDF | Code

Furthermore, we utilize the concept of differentiable particles to manage a wide variety of deformable objects. The manipulation of such objects, including rope, cloth, or beans, presents a significant challenge due to their extensive degrees of freedom and complex non-linear dynamics. This paper introduces Differentiable Particles (DiPac), a novel algorithm designed for the manipulation of deformable objects. DiPac interprets a deformable object as a collection of particles and employs a differentiable particle dynamics simulator to rationalize robotic manipulation. DiPac uses a single representation – particles – for a diverse array of objects: scattered beans, rope, T-shirts, and so forth. The use of differentiable dynamics empowers DiPac to efficiently (i) estimate the dynamics parameters, effectively reducing the simulation-to-real gap, and (ii) select the best action by backpropagating the gradient along sampled trajectories.

Deep Visual Navigation under Partial Observability

How do humans navigate? We navigate with almost exclusive visual sensing and coarse floor plans. To reach a destination, we demonstrate a diverse set of skills, such as obstacle avoidance, and we use tools, such as buses and elevators, to traverse through different locations. All these are not yet possible for robots. To get closer to human-level navigation performance, we propose a controller that is capable of learning complex policies from data.

 

DECISION: Deep rEcurrent Controller for vISual navigatiON

B. Ai, W. Gao, Vinay, and D. Hsu. Deep Visual Navigation under Partial Observability. In International Conference on Robotics and Automation (ICRA), 2022. [paper][code][video]

 

The key lies in designing the structural components that allow the controller to learn the desired capabilities. Here we identified three challenges: 

(i) Partial observability: The robot may not see blind-spot objects, or is unable to detect features of interest, e.g., the intention of a moving pedestrian.

(ii) Multimodal behaviors: Human navigation behaviors are multimodal in nature, and the behaviors are dependent on both local environments and the high-level navigation objective.

(iii) Visual complexity: Due to the high dimensionality of raw pixels, different scenes could appear dramatically different across environments, which makes traditional model-based approaches brittle.

 

To resolve the challenges, we propose two key structure designs

(i) Multi-scale temporal modeling: We use spatial memory modules to capture both low-level motions and high-level temporal semantics that are useful to the control. The rich history information can compensate for the partial observations.

(ii) Multimodal actions: We extend the idea of Mixture Density Networks (MDNs) to temporal reasoning. Specifically, we use independent memory modules for different modes to preserve the distinction of modes.

 

 

We collected a  real-world human demonstration dataset consisting of 410K timesteps and train the controller end-to-end.  Our DECISION controller significantly outperforms CNNs and LSTMs. 

The controller was first deployed on our Boston Dynamics Spot robot in April 2022. It has been navigating autonomously for more than 150km at the time of writing. It has been incrementally improved over time, as an integral component of the GoAnywhere@NUS project.

One feature of this work is that all experimental results are obtained in the real world. Here is our demo video, showing our Spot robot traversing many different locations on our university campus.

Our goal is to exploit the potential of particle representation to enable robots to handle general manipulation tasks, including rigid and deformable objects and liquids.

Ab Initio Particle-based Object Manipulation

S. Chen, X. Ma, Y. Lu, D. Hsu. Ab Initio Particle-based Object Manipulation. In Robotics: Science and Systems, RSS, 2021. [PDF][Code]

This paper presents Particle-based Object Manipulation (Prompt), a new approach to robot manipulation of novel objects ab initio, without prior object models or pre-training on a large object data set. The key element of Prompt is a  particle-based object representation, in which each particle represents a point in the object, the local geometric, physical, and other features of the point, and also its relation with other particles. Like the model-based analytic approaches to manipulation, the particle representation enables the robot to reason about the object’s geometry and dynamics in order to choose suitable manipulation actions. Like the data-driven approaches, the particle representation is learned online in real-time from visual sensor input, specifically, multi-view RGB images. The particle representation thus connects visual perception with  robot  control. Prompt combines the benefits of both model-based reasoning and data-driven learning. We show empirically that Prompt successfully handles a variety of everyday objects, some of which are transparent. It handles various manipulation tasks, including grasping, pushing, etc,. Our experiments also show that Prompt outperforms a state-of-the-art data-driven grasping method on the daily objects, even though it does not use any offline training data.