Zahra Babaiee

Towards Bio-inspired, Small and Robust Deep Learning Vision Systems

Cyber-Physical Systems (CPS) are systems of collaborating embedded systems which are in intensive connection with the ever changing surrounding physical world. These computer systems are composed of a computational and a physical core. The computational core receives information about the environment through sensors in the physical part and by a controller, tells the actuators of the physical component what to do. Many of the prominent applications of embedded systems are reliant on computer vision. These applications range from industrial machine vision systems and autonomous vehicles, to image processing in medicine and disease diagnosis. In recent years, deep learning methods have been shown to achieve the state of the art results in many fields, with computer vision as one of the most prominent ones. Deep learning methods outperform other machine learning methods on various vision problems, including image classification, object recognition and image generation. Thanks to the huge boost of deep learning methods and Appearance of large, labelled and high-quality image datasets like Imagenet [12], computer vision systems are now part of our everyday life, including face recognition systems on smart phones and autonomous driving. While there is a great interest to deploy deep vision systems in our homes, factories, and workplaces to help us solve complex tasks, we should still act careful. They need to be ecient, interpretable, and above all safe in the sense that they work reliably and consistently in uncertain, complex environments. Many deep learning computer vision models use Convolutional Neural Networks (CNNs) [49] which are heavily inspired by the visual cortex. They consist of hierarchical layers that extract localized features. In each convolution layer, small-sized kernel shifts over the input image, convolving each patch with filters. These kernels function like the receptive fields in the retina; they change the activity of the neurons connected to that patch in the next layer. Fukushima’s Neocognitron was one of the first hierarchical neural networks [17] that inspired many other variants. [18, 50, 9] Large CNNs achieve considerable performance level, but with significant computing, memory and energy footprint. These models are dense and over-parameterized. For instance, a ResNet-152 [27] has 60 million parameters (only in convolution layers) and requires 11.3 GFLOPs in a forward pass. Training such a network on full ImageNet dataset takes about 1.5 weeks with 4 GPUs. This over-parameterization limits usage of these systems on resourcelimited environments such as mobile or embedded devices due to the memory and computation costs. Thus, it’s important to come up with smaller models that can perform without significantly losing the accuracy and performance of the bigger counterparts. This can be achieved either by designing smaller network architectures, or training a huge over-parameterized network and then sparsify it by pruneing either the synapses, or the neurons. In addition to the heavy costs of huge networks, they suer from interpretability issues. As the networks’ size grows, also grows the concerns about their black-box nature, compromising the trust in these networks to be used in critical areas. Interpretability is specially important in safety-critical applications like medicine, and can cause algorithmic discrimination and ethical problems [69]. Great success and development of deep vision systems and their usage in safety-critical embedded systems makes security and robustness aspects of these networks increasingly important.

Research has shown that CNNs are often brittle to different noise levels and image perturbations [13, 20, 31], and can be easily fooled by altered images that are completely unrecognisable by humans, and recognise them as certain objects by up to 99:99% confidence [58]. The robustness of these networks is even more in danger when they are sparsified [53]. In my thesis, I intend to focus on creating biologically inspired deep models for vision with two main characteristics: a) with small memory footprint, both via design and via pruning redundant synapses, and b) robust to distribution shifts and adversarial attacks.

Our website uses cookies to ensure you get the best experience on our website, for analytical purposes, to provide social media features, and for targeted advertising. This it is necessary in order to pass information on to respective service providers. If you would like additional information about cookies on this website, please see our Data Protection Declaration.

These cookies are required to help our website run smoothly.

Name	Purpose	Lifetime	Type	Provider
wordpress_test_cookie	Testing-Cookie to check whether cookies are allowed.	1 Year	HTTP	Homepage TUW
PHPSESSID	Used by WordPress to retain the state of your current user session for all page requests.	Session	HTTP	Homepage TUW
wordpress_logged_in_{hash}	Used by Wordpress to keep users logged in. {hash} represents an unique user token.	1 Year	HTTP	Homepage TUW
wp-settings-time-{id}	Used to customize your view of admin interface, and possibly also the main site interface.	1 Year	HTTP	Homepage TUW
wordpress_sec_{hash}	This cookie is used to store your authentication details. Its use is limited to the admin console area. {hash} represents an unique user token.	1 Year	HTTP	Homepage TUW
wp-settings-{id}	Used to customize your view of admin interface, and possibly also the main site interface.	1 Year	HTTP	Homepage TUW
wp-wpml_current_language	Stores the current language. This cookie is enabled by default on sites that use the Language filtering for AJAX operations feature.	1 Day	HTTP	Homepage TUW
wp-wpml_current_admin_language_{hash}	Stores the current WordPress administration area language. {hash} represents an unique user token.	1 Day	HTTP	Homepage TUW
CookieConsent_d608fe	Saves your settings for the use of cookies on this website.	1 Year	HTML	Homepage TUW

These cookies help us to continuously improve our services and adapt our website to your needs. We statistically evaluate the pseudonymized data collected from our website.

Name	Purpose	Lifetime	Type	Provider
_pk_id.136.56ce	Used to store a few details about the user such as the unique visitor ID.	13 months	HTML	Matomo TUW
_pk_ref	Is used to store the information of the users home website.	6 months	HTML	Matomo TUW
_pk_ses.136.56ce	Is needed to store temporary data of the visit.	30 minutes	HTML	Matomo TUW

Towards Bio-inspired, Small and Robust Deep Learning Vision Systems

About Cookies