Object-Independent Human-to-Robot Handovers using Real Time Robotic Vision
Authors: Patrick Rosenberger [TU Wien] [ORCID], opens an external URL in a new window [Homepage], opens an external URL in a new window, Akansel Cosgun [Uni Monash], opens an external URL in a new window, Rhys Newbury, Jun Kwan, Valerio Ortenzi [Uni Brimingham], opens an external URL in a new window [ORCID], opens an external URL in a new window, Peter Corke [QUT], opens an external URL in a new window [ORCID], opens an external URL in a new window [Homepage], opens an external URL in a new window and Manfred Grafinger [, opens an external URL in a new windowTU Wien] [ORCID], opens an external URL in a new window
Publication: coming soon
Preprint: coming soon
Programming Language: The code developed within this project is written in Python 2.7 and 3.6, depending on the module. Please refer to the individual repositories for more information.
Hardware: This project has been implemented using a Franka-Emika Panda Arm, opens an external URL in a new window and a Realsense D435, opens an external URL in a new window.
This project introduces an approach for safe and object-independent human-to-robot handovers using real time robotic vision and manipulation. We aim for general applicability by combining the power of a generic object detector (darknet_ros), a real-time grasp selection algorithm (ggcnn_humanseg_ros) and two semantic segmentation modules for body segmentation (bodyparts_ros) and hand segmentation (egohands_ros).
The appraoch uses a RGB-D camera that is mounted at the robot’s end effector and provides a steady stream of RGB and depth images. For each frame, the object detector detects all objects within the camera’s field of view and selects the ones within the robot’s reach. Further, all pixels belonging to the human interaction partner and the partner’s hands are segmented. The grasp selection module uses these inputs to calculate the a grasp quality estimation along with the associated grasp orientation and gripper width for each pixel in the depth image. Finally, the grasp point with the highest estimated success likelihood is chosen and translated into the robot’s base frame. The robot driver module moves the end effector towards the selected grasp point via visual servoing. The segmentation masks are updated in real-time to dynamically handle the changes in the hand/body positions.
This module implements a light-weight RefineNet NN, opens an external URL in a new window trained on the PASCAL body parts data set. The NN is capable of detecting human body parts and can differentiate between heads, torsos, upper arms, lower arms, upper legs, and lower legs with a mean intersection-over-union (mIoU) score of 0.649 (Nek18), opens an external URL in a new window.
This module implements a Scene Parsing framework (PSPNet), opens an external URL in a new window retrained on the egohands data set. The trained model achieved a mIoU of 0.897 and a pixel accuracy of 0.986 on the validation set.
This module implements a YOLO v3 object detector, opens an external URL in a new window, trained on the COCO dataset. Since our goal is to enable handovers for any class of objects, we allow misclassifications for objects that do not belong to one of the 80 categories of the dataset.
This module implements a GGCNN, opens an external URL in a new window. The node outputs the best picking location based on an object's depth image and the input of the three modules bodyparts_ros, egohands_ros and darknet_ros. Extensive pre- and post-processing prevents the picking of human body parts.
This module provides a driver for object-independent human-to-robot handovers using robotic vision. The approach requires only one RGBD camera and can therefore be used in a variety of use cases without the need for artificial setups like markers or external cameras.
Special thanks go to Vienna University of Technology and the Australian Center for Robotic Vision (ACRV), opens an external URL in a new window for enabling this research project.
The project is licensed under the BSD 4-Clause License.
Please keep in mind that no system is 100% fault tolerant and that this demonstrator is focused on pushing the boundaries of innovation. Careless interaction with robots can lead to serious injuries, always use appropriate caution!
This software is provided by the copyright holders and contributors "as is" and any express or implied warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed. in no event shall the copyright holder or contributors be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.