Deep Reinforcement Learning for Gripper Vector Estimation

Olsen, Thomas; Ottesen, Birk Midtbø

Olsen, Thomas; Ottesen, Birk Midtbø

Master thesis

Åpne

19346_FULLTEXT.pdf (26.04Mb)

19346_COVER.pdf (1.556Mb)

Permanent lenke

http://hdl.handle.net/11250/2615886

Utgivelsesdato

2018

Metadata

Vis full innførsel

Samlinger

Institutt for datateknologi og informatikk [6778]

Sammendrag

The problem of gripper vector estimation, also referred to as gripper pose estimation, is the problem of constructing a vector describing the pose of the end-effector of a robotic gripper, which enables it to grasp an object. Recent work have investigated the use of Artificial Intelligence (AI) for constructing the gripper logic, where among others the use of Deep Learning (DL) and supervised learning has been applied. However, as supervised algorithms require large labeled datasets to be constructed, we propose the use of Deep Reinforcement Learning (DRL). The utilization of DRL mitigates the concerns regarding constructions of labeled datasets, but often requires more training and data than supervised algorithms. To ease these challenges, we propose the use of a simulated environment, where training of DRL algorithms can be conducted significantly faster than in the real world. It can be expected that a simulated environment will differ from the real world environment in many aspects, and thus, problems regarding transfer learning will arise. To increase the prospects of transfer learning, we propose the use of Domain Randomization (DR) in the simulated environment. To ensure good and descriptive information is available for the DRL agent, we propose the use of a state space consisting of color images combined with their respective depth images (RGB-D).

To investigate the prospects of DRL in the context of gripper pose estimation and dexterous robotic manipulation, a state-of-the-art literature review was conducted. The DRL algorithms Deep Deterministic Policy Gradients (DDPG) and Proximal Policy Optimization (PPO) were deemed promising and therefore investigated. The simulation environment constructed was realized using the Unity Game Engine, exploiting its newly released Machine Learning (ML) library which enabled communication with the TensorFlow library. To enable evaluations of the agents trained in simulation, a real world setup for the gripper pose estimation task was constructed. This setup was realized with the Panda robot and a two finger gripper, both made by Franka Emika, and the color and depth sensing camera, Intel Realsense SR300.

The main question this thesis addresses is whether a Deep Reinforcement Learning (DRL) agent, solely trained in simulation, can achieve satisfactory results for the problem of gripper pose estimation in a real world environment without any domain adaption or fine tuning. Our best DRL agent achieved a successful grasp prediction rate of 60% when evaluated for 60 gripper pose estimation attempts, and 88.3% of the grasp attempts were either successful or within one centimeter and five degrees from a valid grasp. Additionally, the mean positional and rotational offsets from a gripper pose that would have resulted in a valid grasp were respectively; 0.47 centimeters with a standard deviation of 0.75 centimeters and 0.6 degrees with a standard deviation of 2.1 degrees. Our main contributions to the field are; evaluations for specific application domains of the PPO DRL algorithm, additional evidence that DR positively impacts transfer learning when transferring from simulation to real world environments. Last but foremost, we have contributed to our field by demonstrating that an agent trained completely and solely in a simulation environment is able to perform successful grasping predictions for semi-compliant objects in the real world after transfer learning, without any domain adaptation.

Despite aspects of Domain Randomization (DR) being incorporated in our simulation environment, we observed that the Deep Reinforcement Learning (DRL) agents were sensitive to lighting conditions in our real world setup. In light of this, we suggest the inclusion of more expressive DR aspects regarding lighting conditions, in the simulation environment. The results obtained with a generic Franka gripper, not customized for semi-compliant objects, along with our general observations, lead us to strongly suspect that DRL agents only trained in simulation can produce satisfactory results in a real world environment for the problem of gripper pose estimation for semi-compliant objects.

Utgiver

NTNU