Object-Independent Human-to-Robot Handovers using Real Time Robotic Vision

Authors: Patrick Rosenberger [TU Wien] [ORCID], opens an external URL in a new window [Homepage], opens an external URL in a new window, Akansel Cosgun [Uni Monash], opens an external URL in a new window, Rhys Newbury, Jun Kwan, Valerio Ortenzi [Uni Brimingham] [ORCID], opens an external URL in a new window, Peter Corke [QUT], opens an external URL in a new window [ORCID], opens an external URL in a new window [Homepage], opens an external URL in a new window and Manfred Grafinger [, opens an external URL in a new windowTU Wien[ORCID], opens an external URL in a new window

Publication: coming soon

Preprint: coming soon

Code: https://github.com/patrosAT/h2r_handovers, opens an external URL in a new window

Programming Language: The code developed within this project is written in Python 2.7 and 3.6, depending on the module. Please refer to the individual repositories for more information.

Hardware: This project has been implemented using a Franka-Emika Panda Arm, opens an external URL in a new window and a Realsense D435, opens an external URL in a new window.

After activation, data may be transmitted to third parties. Data protection declaration., opens in new window

Objektunabhängige Mensch-zu-Roboter Handovers mit Echtzeit-Robotik-Vision

This YouTube video shows the handover of 20 household objects from a frontal and a lateral perspective.

Approach

This project introduces an approach for safe and object-independent human-to-robot handovers using real time robotic vision and manipulation. We aim for general applicability by combining the power of a generic object detector (darknet_ros), a real-time grasp selection algorithm (ggcnn_humanseg_ros) and two semantic segmentation modules for body segmentation (bodyparts_ros) and hand segmentation (egohands_ros).

The approach uses a RGB-D camera that is mounted at the robot’s end effector and provides a steady stream of RGB and depth images. For each frame, the object detector detects all objects within the camera’s field of view and selects the ones within the robot’s reach. Further, all pixels belonging to the human interaction partner and the partner’s hands are segmented. The grasp selection module uses these inputs to calculate the a grasp quality estimation along with the associated grasp orientation and gripper width for each pixel in the depth image. Finally, the grasp point with the highest estimated success likelihood is chosen and translated into the robot’s base frame. The robot driver module moves the end effector towards the selected grasp point via visual servoing. The segmentation masks are updated in real-time to dynamically handle the changes in the hand/body positions.

Module bodyparts_ros

This module implements a light-weight RefineNet NN, opens an external URL in a new window trained on the PASCAL body parts data set. The NN is capable of detecting human body parts and can differentiate between heads, torsos, upper arms, lower arms, upper legs, and lower legs with a mean intersection-over-union (mIoU) score of 0.649 (Nek18), opens an external URL in a new window.

Code: https://github.com/patrosAT/bodyparts_ros, opens an external URL in a new window

H2R body front (PNG)

H2R body front (PNG)

Module egohands_ros

This module implements a Scene Parsing framework (PSPNet), opens an external URL in a new window retrained on the egohands data set. The trained model achieved a mIoU of 0.897 and a pixel accuracy of 0.986 on the validation set.

Code: https://github.com/patrosAT/egohands_ros, opens an external URL in a new window

H2R hand front (PNG)

H2R hand front (PNG)

Module darknet_ros

This module implements a YOLO v3 object detector, opens an external URL in a new window, trained on the COCO dataset. Since our goal is to enable handovers for any class of objects, we allow misclassifications for objects that do not belong to one of the 80 categories of the dataset.

Code: https://github.com/leggedrobotics/darknet_ros, opens an external URL in a new window

H2R yolohand (PNG)

H2R yolohand (PNG)

Module ggcnn_humanseg_ros

This module implements a GGCNN, opens an external URL in a new window. The node outputs the best picking location based on an object's depth image and the input of the three modules bodyparts_ros, egohands_ros and darknet_ros. Extensive pre- and post-processing prevents the picking of human body parts.

Code: https://github.com/patrosAT/ggcnn_humanseg_ros, opens an external URL in a new window

H2R ggcnn RGB (PNG)

H2R ggcnn RGB (PNG)

module h2r_handovers

This module provides a driver for object-independent human-to-robot handovers using robotic vision. The approach requires only one RGBD camera and can therefore be used in a variety of use cases without the need for artificial setups like markers or external cameras.

Code: https://github.com/patrosAT/h2r_handovers, opens an external URL in a new window

H2R grasp point (PNG)

H2R grasp point (PNG)

Acknowledgments

Special thanks go to TU Wien and the Australian Center for Robotic Vision (ACRV), opens an external URL in a new window for enabling this research project.

Project Partners Logos (PNG)

TU Wien - Monash University - University of Birmingham - Australian Centre for Robotic Vision (ACRV) (PNG)

License

The project is licensed under the BSD 4-Clause License.

Disclaimer

Please keep in mind that no system is 100% fault tolerant and that this demonstrator is focused on pushing the boundaries of innovation. Careless interaction with robots can lead to serious injuries, always use appropriate caution!

This software is provided by the copyright holders and contributors "as is" and any express or implied warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed. in no event shall the copyright holder or contributors be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.