AUDI Konfuzius-Institut Ingolstadt

AKII Finalist in Merck “Future of AI” Challenge

Dr. Axenie together with two students of THI (team NeuroTHIx: Du Xiaorui, Yavuzhan Erdem, Cristian Axenie) qualified for the final of Merck “Future of AI” Challenge.

IRENA (Invariant Representations Extraction in Neural Architectures) 

In this project we aim at building an unsupervised learning system that is based on and
inspired by our biological intelligence for the problem of learning invariant representations.
Mammalian visual systems are characterized by their ability to recognize stimuli invariant to
various transformations. With our proposed model, we investigate the hypothesis that this
ability is achieved by the temporal encoding of visual stimuli, and why not, other sensory
stimuli. By using a model of a multisensory cortically inspired network, we show that this
encoding is invariant to several transformations and robust with respect to stimulus
variability. Furthermore, we show that the proposed model provides a rapid encoding and
computation, in accordance with recent physiological results.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

https://app.ekipa.de/challenge/future-of-ai/brief

The team is now invited in Darmstadt at Merck’s Research Center for a 2 day boot camp to refine the idea and then work on it until end August. To conclude the Challenge, we will present our final solution in Darmstadt on 30th of August.

Well done and good luck!

 

Two new papers accepted at the 28th International Conference on Artificial Neural Networks

The International Conference on Artificial Neural Networks (ICANN) is the annual flagship conference of the European Neural Network Society (ENNS). For the 2019 edition of ICANN AKII Microlab has two papers accepted.

 

Neural Network 3D Body Pose Tracking and Prediction for Motion-to-Photon Latency Compensation in Distributed Virtual Reality – Sebastian Pohl, Armin Becher, Thomas Grauschopf, Cristian Axenie

NARPCA: Neural Accumulate-Retract PCA for Low-latency High-throughput Processing on Datastreams – Cristian Axenie, Radu Tudoran, Stefano Bortoli, Mohamad Al Hajj Hassan, Goetz Brasche

 

Well done team !

Vortrag: Im Reichen der Zeichen

Vortrag: Im Reich der Zeichen: Geheime Botschaften – Entstehung und Entwicklung chinesischer Schriftzeichen

Die chinesische Schrift reflektiert sprachliche sowie auch kulturelle Eigenheiten Chinas, sagt unsere Referentin, Frau Esther Haubensack, Sinologin und Linguistin.  An dem Vortragabend möchte sie mit den Besuchern (keine Chinesisch-Vorkenntnisse erforderlich!) anhand einiger Beispiele deren Entstehung, Aufbau und Entwicklung verfolgen. Die Enthüllung der Geheimnisse verhilft dann sogar, einzelne Zeichen bestimmten Bedeutungsfeldern zuordnen zu können.

 

Zur Referentin:

Esther Haubensack, M.A. studierte an der Ludwig-Maximilians-Universität München Linguistik und Sinologie. Während ihres Auslandsstudiums in Peking führte sie ihre Sprachbegeisterung auf die chinesische Fernsehbühne, wo sie in Sketchen und Fernsehserien mitwirkte. Derzeit pendelt sie als mehrsprachige Moderatorin zwischen China und Deutschland.

AKII Microlab XMAS party

In a rather small group, increased by a spontaneous KU Eichstätt student delegation visit, the AKII Microlab Xmas party was a warm and really “sweet event”.  Thanks to Thomas we enjoyed all the season traditional German sweets and Glühwein. The afternoon brought interesting chats and offered ground for new research challenges in 2019.

Merry Christmas and a Happy New Year!

AKII Microlab at ISMAR2018

After a successful acceptance of our first paper on VIRTOOAIR, Armin has been invited to present and demo AKII Microlab’s flagship project at the IEEE ISMAR 2018. This is the leading international academic conference in the fields of Augmented Reality and Mixed Reality. The symposium is organized and supported by the IEEE Computer Society, IEEE VGTC, and ACM SIGCHI. Good job Armin!

AKII Microlab in Lions Club Lectures

14.09.2018 – Dr. Axenie was invited to hold a lecture in Lions Club Ubersee – Cyber. Along with Dennis Morgenstern, Head of Google Automotive, Dr. Axenie introduced to the entrepreneurial audience the values AI and VR have for the digital transformation.

 

Lions Club Übersee-Forum am Chiemsee-Cyber
21.09.2018

Protokoll 1. CA 14.09.18 LC Übersee-Forum am Chiemsee-Cyber….Vortrag Digitalisierung/künstliche Intelligenz

Liebe Lionsfreunde ! Liebe Gäste !

In meiner neuen Funktion als Clubsekretär darf ich Euch anbei das Protokoll unseres ersten Clubabends vom 14.09.2018 zukommen lassen. Der Abend widmete sich ganz dem Thema Digitalisierung, künstliche Intelligenz & Virtual Reality. Die anwesenden Lionsfreunde und Gäste waren begeistert.

Protokoll / Agenda:
– Eröffnung des Abends durch unseren Clubpräsidenten Dr. Pieper; an diesem Abend begleitet durch den Eventpräsidenten L Reiner Jaspers.

– Begrüßung der anwesenden Lionsfreunde, der Cyber-Lions, der zahlreichen Gäste der Wirtschaftsförderungs GmbH Landkreis Traunstein und insbesondere MDL Klaus Steiner als Mitglied des bayerischen Landtags, Frau Dr. Seeholzer (CEO Wirtschaftsförderungs GmbH Landkreis Traunstein), Prof. Volker Hagn (Präsident LC Salzburg) und Frau Prof. Sigrid Hagn (internationale Pianistin), die leider verletzungsbedingt verhindert waren, Dennis Morgenstern (Head of Google Automotive), Dr. Christian Axenie (Head of Artificial Intelligence at Audi Konfuzius Institute Ingolstadt) und Doris Wagnerberger (Frauen für Frauen)

– Gratulation zu den Geburtstagen: Dr. Ralph Felbinger, Martina Tschackert, Simone Pieper-Cuber, Bernd Becking ! Happy Birthday to you

– Martina Tschackert und Uli Tschackert standen in schicken Tshirts den Besuchern für Erläuterungen zu Lions International und unseren Club LC Übersee zur Verfügung.

– Kurze Erläuterung der Philosophie unseres Clubs und der Regularien für diesen Clubabend:

1. Einnahmen von 15.000 Euro durch unser Event „Kick it like Beckham“ in Salzburg mit Spende dieses Gewinns inklusive Scheckübergabe an die Frauenhäuser Rosenheim und Traunstein, Frauen für Frauen e.V. und einer Schule Hoekwil in Südafrika/Gardenroute
2. Ankündigung einer neuen Clubactivity-Idee im April 2019: Lauf entlang des Chiemsees von Übersee nach Chieming, organisiert gemeinsam mit der Grund- & Mittelschule Chieming
3. Kurze Zusammenfassung des Lions Events „Wir reisen nach Berlin“ von 04.10 bis 07.10.2018 mit Dank an den ehemaligen Verkehrsminister Dr. Ramsauer für die Reichstags- Bundestagsbesichtigung. Tolles Programm !
4. Ankündigung der Distriktversammlung mit Teilnahme des Vorstandes am 13.10.2018 in Erding
5. Grußwort von Frau Dr. Birgit Seeholzer mit Dank an Landrat Herrn Siegfried Walch sowie Vorstellung des Wirtschaftsförderungs GmbH des Landkreises Traunstein (es gibt 17.000 Unternehmen in Landkreis Traunstein!). Kompliment an den Landkreis für diese Institution !

– Übergabe an den Eventpräsidenten Reiner Jaspers (mit großem Dank an Ihn, der von Hannover über München nach Ising und von dort nach Genf auf dem Wege
war !)

– Vortrag von Dennis Morgenstern „Digitalisierung The new Normal“ (Vortrag beigefügt)

– Vortrag von Dr. Christian Axenie „Künstliche Intelligenz….Artificial Intelligence and Virtual Reality“ (Vortrag beigefügt)

– Wir waren von den Vorträgen „gefesselt“ ! Viele Fragen & Antworten waren die Folge. Der Beifall aller Zuhörer war ebenso wie die Übergabe des Lions-Weins an beide Redner der ganze grosse Dank für die inhaltliche und rhetorische Leistung !

– Abschließende Zusammenfassung durch Eventpräsident Reiner Jaspers und anschließend durch Clubpräsident Dr. Pieper zu diesem ganz besonderen Clubabend.

– Zu guter Letzt: der Abend wollte nicht weichen. Wir blieben noch lange zusammen, bei abendlichem Essen und vielen anregenden Gesprächen.

Mit herzlichen Grüßen

Julia Felbinger

Anhänge:
18.09.14_Vortrag_Dr. Cristian Axenie_AIpoweredVRforSociety (6,66MB)
18.09.14_Vortrag Dennis Morgenstern_Google_The_New_Normal (7,36MB)

Verteiler:
Clubmitglieder (18/19) des Clubs Übersee-Forum am Chiemsee-Cyber (BS-IV-2)

 

 

Software

Current Research Projects Codebases

CHIMERA: Combining Mechanistic Models and Machine Learning for Personalized Chemotherapy and Surgery Sequencing in Breast Cancer


TUCANN: TUmor Characterization using Artificial Neural Networks


GLUECK: Growth pattern Learning for Unsupervised Extraction of Cancer Kinetics

 


PRINCESS: Prediction of Individual Breast Cancer Evolution to Surgical Size


Previous Projects

Research Projects

 

VR in Automotive

 

A series of collaborations with VW and Audi which want to collaborate in VR. Work investigating high-end renderings, construction validation, virtual installations and ergonomics. We focused on analyzing latencies in distributed VR.

 


VR in Art and Historical Projects


VR for Rehabilitation


Adaptive Neuromorphic Sensorimotor Control

 

Efficient sensorimotor processing is inherently driven by physical real-world constraints that an acting agent faces in its environment. Sensory streams contain certain statistical dependencies determined by the structure of the world, which impose constraints on a system’s sensorimotor affordances.

This limits the number of possible sensory information patterns and plausible motor actions. Learning mechanisms allow the system to extract the underlying correlations in sensorimotor streams.

This research direction focused on the exploration of sensorimotor learning paradigms for embedding adaptive behaviors in robotic system and demonstrate flexible control systems using neuromorphic hardware and neural-based adaptive control. I employed large-scale neural networks for gathering and processing complex sensory information, learning sensorimotor contingencies, and providing adaptive responses.

To investigate the properties of such systems, I developed flexible embodied robot platforms and integrate them within a rich tool suite for specifying neural algorithms that can be implemented in embedded neuromorphic hardware.

 

The mobile manipulator I developed at NST for adaptive sensorimotor systems consists of an omni-directional (holonomic) mobile manipulation platform with embedded low-level motor control and multimodal sensors.

The on-board micro-controller receives desired commands via WiFi and continuously adapts the platform’s velocity controller. The robot’s integrated sensors include wheel encoders for estimating odometry, a 9DoF inertial measurement unit, a proximity bump-sensor ring and three event-based embedded dynamic vision sensors (eDVS) for visual input.

 

The mobile platform carries an optional 6 axis robotic arm with a reach of >40cm. This robotic arm is composed of a set of links connected together by revolute joints and allows lifting objects of up to 800 grams. The mobile platform contains an on-board battery of 360 Wh, which allows autonomous operation for well above 5h.

 


Synthesis of Distributed Cognitive Systems – Learning and Development of Multisensory Integration

 

My research interest is in developing sensor fusion mechanisms for robotic applications. In order to extend the framework of the interacting areas a second direction in my research focuses on learning and development mechanisms.

Human perception improves through exposure to the environment. A wealth of sensory streams which provide a rich experience continuously refine the internal representations of the environment and own state. Furthermore, these representations determine more precise motor planning.

An essential component in motor planning and navigation, in both real and artificial systems, is egomotion estimation. Given the multimodal nature of the sensory cues, learning crossmodal correlations improves the precision and flexibility of motion estimates.

During development, the biological nervous system must constantly combine various sources of information and moreover track and anticipate changes in one or more of the cues. Furthermore, the adaptive development of the functional organisation of the cortical areas seems to depend strongly on the available sensory inputs, which gradually sharpen their response, given the constraints imposed by the cross-sensory relations.

Learning processes which take place during the development of a biological nervous system enable it to extract mappings between external stimuli and its internal state. Precise ego-motion estimation is essential to keep these external and internal cues coherent given the rich multisensory environment. In this work we present a learning model which, given various sensory inputs, converges to a state providing a coherent representation of the sensory space and the cross-sensory relations.

Figure 1

The model is based on Self-Organizing-Maps and Hebbian learning (see Figure 1) using sparse population coded representations of sensory data. The SOM is used to represent the sensory data, while the Hebbian linkage extracts the coactivation pattern given the input modalities eliciting peaks of activity in the neural populations. The model was able to learn the intrinsic sensory data statistics without any prior knowledge (see Figure 2).

Figure 2

The developed model, implemented for 3D egomotion estimation on a quadrotor, provides precise estimates for roll, pitch and yaw angles (setup depicted in Figure 3a, b).

Figure 3

Given relatively complex and multimodal scenarios in which robotic systems operate, with noisy and partially observable environment features, the capability to precisely and timely extract estimates of egomotion critically influences the set of possible actions.

Utilizing simple and computationally effective mechanisms, the proposed model is able to learn the intrinsic correlational structure of sensory data and provide more precise estimates of egomotion (see Figure 4a, b).

Figure 4

Moreover, by learning the sensory data statistics and distribution, the model is able to judiciously allocate resources for efficient representation and computation without any prior assumptions and simplifications. Alleviating the need for tedious design and parametrisation, it provides a flexible and robust approach to multisensory fusion, making it a promising candidate for robotic applications.

 


Synthesis of Distributed Cognitive Systems – Interacting Cortical Maps for Environmental Interpretation

 

The core focus of my research interest is in developing sensor fusion mechanisms for robotic applications. These mechanisms enable a robot to obtain a consistent and global percept of its environment using available sensors by learning correlations between them in a distributed processing scheme inspired by cortical mechanisms.

Environmental interaction is a significant aspect in the life of every physical entity, which allows the updating of its internal state and acquiring new behaviors. Such interaction is performed by repeated iterations of a perception-cognition-action cycle, in which the entity acquires and memorizes relevant information from the noisily and partially observable environment, to develop a set of applicable behaviors (see Figure 5).

Figure 5

This recently started research project is in the area of mobile robotics, and more specifically in explicit methods applicable for acquiring and maintaining such environmental representations. State-of-the-art implementations build upon probabilistic reasoning algorithms, which typically aim at optimal solutions with the cost of high processing requirements.

In this project, we have developed an alternative, neurobiologically inspired method for real-time interpretation of sensory stimuli in mobile robotic systems: a distributed networked system with inter-merged information storage and processing that allows efficient parallel reasoning. This networked architecture will be comprised of interconnected heterogeneous software units, each encoding a different feature about the state of the environment that is represented by a local representation (see Figure 6).

Figure 6

Such extracted pieces of environmental knowledge interact by mutual influence to ensure overall system coherence. A sample instantiation of the developed system focuses on mobile robot heading estimation (see Figure 7). In order to obtain a robust and unambiguous description of robot’s current orientation within its environment inertial, proprioceptive and visual cues are fused (see image). Given available sensory data, the network relaxes to a globally consistent estimate of the robot’s heading angle and position.

Figure 7

 


Adaptive Nonlinear Control Algorithm for Fault-Tolerant Robot Navigation

 

Today’s trends in control engineering and robotics are blending gradually into a slightly challenging area, the development of fault-tolerant real-time applications. Hence, applications should timely deliver synchronized data-sets, minimize latency in their response and meet their performance specifications in the presence of disturbances. The fault-tolerant behavior in mobile robots refers to the possibility to autonomously detect and identify faults as well as the capability to continue operating after a fault occurred. This work introduces a real-time distributed control application with fault tolerance capabilities for differential wheeled mobile robots (see Figure 8).

Figure 8

You are currently viewing a placeholder content from YouTube. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.

More Information

Furthermore, the application was extended to introduce a novel implementation for limited sensor mobile robots environment mapping. The developed algorithm is a SLAM implementation. It uses real-time data acquired from the sonar ring and uses this information to feed the mapping module for offline mapping (see Figure 9).

Figure 9

The latter is running on top of the real-time fault-tolerant control application for mobile robot trajectory tracking operation (see Figures 10, 11).

Figure 10

Figure 11


Competitions / Hackathons

Merck Research “Future of AI Challenge” (August 2019)

IRENA (Invariant Representations Extraction in Neural Architectures)
Team NeuroTHIx codebase. 1st Place at Merck Future of AI Research Challenge.
https://app.ekipa.de/challenge/future-of-ai/about

THI coverage:

https://www.thi.de/suche/news/news/thi-erfolgreich-in-ai-forschungswettbewerb

Merck Research Challenge aimed to generate insights from various disciplines that can lead to progress towards an understanding of invariant representation – that are novel and not based on Deep Learning.

IRENA (Invariant Representations Extraction in Neural Architectures) is the approach that team NeuroTHIx developed. IRENA offers a computational layer for extracting sensory relations for rich visual scenes, withy learning, inference, de-noising and sensor fusion capabilities. The system is also capable, through its underlying unsupervised learning capabilities, to embed semantics and perform scene understanding.

Using cortical maps as neural substrate for distributed representations of sensory streams, our system is able to learn its connectivity (i.e., structure) from the long-term evolution of sensory observations. This process mimics a typical development process where self-construction (connectivity learning), self-organization, and correlation extraction ensure a refined and stable representation and processing substrate. Following these principles, we propose a model based on Self-Organizing Maps (SOM) and Hebbian Learning (HL) as main ingredients for extracting underlying correlations in sensory data, the basis for subsequently extracting invariant representations.

 


 


University of Cambridge Hackathon – Hack Cambridge (January 2017)

 

Microsoft Faculty Connection coverage (http://goo.gl/uPWGna) for project demo at Hack Cambridge 2017, 28 – 29 January 2017, University of Cambridge with a Real-time Event-based Vision Monitoring and Notification System for Seniors and Elderly using Neural Networks.

It has been estimated that 33% of people age 65 will fall. At around 80, that increases to 50%. In case of a fall, seniors who receive help within an hour have a better rate of survival and, the faster help arrives, the less likely an injury will lead to hospitalization or the need to move into a long-term care facility. In such cases fast visual detection of abnormal motion patterns is crucial.

In this project we propose the use of a novel embedded Dynamic Vision Sensor (eDVS) for the task of classifying falls. Opposite from standard cameras which provide a time sequenced stream of frames, the eDVS provides only relative changes in a scene, given by individual events at the pixel level. Using this different encoding scheme the eDVS brings advantages over standard cameras. First, there is no redundancy in the data received from the sensor, only changes are reported. Second, as only events are considered the eDVS data rate is high. Third, the power consumption of the overall system is small, as just a low-end microcontroller is used to fetch events from the sensor and can ultimately run for long time periods in a battery powered setup. This project investigates how can we exploit the eDVS fast response time and low-redundancy in making decisions about elderly motion.

The computation back-end will be realized with a neural network classification to detect fall and filter outliers. The data will be provided from 2 stimuli (blinking LEDs at different frequencies) and will represent the actual position of the person wearing them. The changes in position of the stimuli will encode the possible positions corresponding to falls or normal cases.

We will use Microsoft Azure ML Studio to implement a MLP binary classifier for the 4 (2 stimuli x 2 Cartesian coordinates – (x,y) in the field of view) dimensional input. We labelled the data with Fall (F) and No Fall (NF).

 


ANDRITZ Pioneers Hackathon (January 2017)

 

Best innovation idea at the ANDRITZ Pioneers Hackaton innovating for the international technology group ANDRITZ. Developed an artificial neural learning agent for automation process productivity enhancement.

 


Wellcome Trust Hack The Senses Hackathon (June 2016)

 

WIRED UK coverage ( http://goo.gl/5yQ1Fn ) at the Hack the Senses hackathon in London: How to hack your senses: from ‘seeing’ sound to ‘hair GPS’: “Two-man team HearSee built a headband that taps into synaesthesia, translating changes in frame-less video to sound allowing blind people or those with a weak vision to see motion. Roboticist and neurologist Cristian Axenie assembled the hardware in mere minutes – attaching a pair of cameras and wires to a terrycloth headband.”

 


Daimler FinTech Hackathon (April 2016)

 

Awarded 1st prize (team) at the Daimler Financial Services Big Data Analytics Hackaton for the design of a neuro-fuzzy learning system for anomaly detection and user interaction in big data streams.

 


Burda Hackdays (April 2016)

 

Awarded special Microsoft Cognitive Technologies prize at the Burda Hackdays for the design of a neural learning system for inferring role assignments in working teams using psychometric data analytics.


Automotive Hackdays (March 2016)

 

Awarded 1st prize (team) in the BMW Automotive Hackdays for the design of an inference system for driving profile learning and recommendation for skills improvement and predictive maintenance in car-sharing.

Neuromorphic Vision Processing for Autonomous Electric Driving

Scope

The current research project aims at exploring object detection algorithms using a novel neuromorphic vision sensor with deep learning neural networks for autonomous electric cars. More precisely, this work will be conducted with the Schanzer Racing Electric (SRE) team at the Technical University of Ingolstadt. SRE is a team of around 80 students, that design, develop and manufacture an all-electric racing car every year to compete in Formula Student Electric. The use of neuromorphic vision sensors together with deep neural networks for object detection is the innovation that the project proposes.We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.

Context 

Autonomous driving is a highly discussed topic, but in order to operate autonomously, the car needs to sense its environment. Vision provides the most informative and structured modality capable of grounding perception in autonomous vehicles. In the last decades, classical computer vision algorithms were used to not only locate relevant objects in the scene, but also to classify them. But in recent years, major improvements were reached when first deep learning object detectors were developed.

In general, such object detectors use a convolutional feature extractor as their basis. Due to the multitude of feature extraction algorithms, there are numerous combinations of feature extractor and object detectors, which influences a system designer’s approach. One of the most interesting niches is the analysis of traffic scenarios. Such scenarios require fast computation of features and classification for decision making.

Our approach to object detection, recognition and decision making aims at “going away from frames”. Instead of using traditional RGB cameras we aim at utilizing dynamic vision sensors (DVS – https://inivation.com/dvs/). Dynamic vision sensors mimic basic characteristics of human visual processing (i.e. neuromorphic vision) and have created a new paradigm in vision research.

Similar to photoreceptors in the human retina, a single DVS pixel (receptor) can generate events in response to a change of detected illumination. Events encode dynamic features of the scene, e.g. moving objects, using a spatio-temporal set of events. Since DVS sensors drastically reduce redundant pixels (e.g. static background features) and encode objects in a frame-less fashion with high temporal resolution (about 1 μs), it is well suited for fast motion analyses and tracking. DVS are capable of operating in uncontrolled environments with varying lighting conditions because of their high dynamic range of operation (120 dB).

As traffic situations yield fast detection and precise estimation, we plan to use such an event-based visual representation together with two convolutional networks proved to be suitable for the task. The two algorithms we plan to explore are the Single Shot Multibox Detector (SSMD), which got popular for its fast computation speed, and the Faster Region-Based Convolutional Neural Network (Faster RCNN), which is known to be a slow but performant detector.

Motivation

The project tries to set a fundamental exploratory work, both in terms of sensory data for environment perception and also neural network architectures for the considered task. The experiments aim at evaluating also the points where better accuracy can only be obtained by sacrificing computation time. The two architecture we chose are opposite. The first one is the SSMD network with Inception V2 as a feature extractor. This network has a low computation time with acceptable accuracy. The correspondent network is the Faster RCNN with ResNet-101 as its feature extractor. Its accuracy is one of the highest, whereas the computation time is relatively slow.

Whereas features are common for frame-based computer vision problems, no solution exists yet to determine unique features in event streams. This is the first step towards more complex algorithms operating on the sparse event-stream. The possibility to create unique filter responses gives rise to the notion of temporal features. This opens the exploratory work we envision in this project, to investigate the use of SSMD and Faster RCNN networks using event-based input in a natively parallel processing pipeline.

Current state (November 2018)

Progress overview

In the progress of developing deep neural network structures (i.e. the SSMD and Faster RCNN) that operate on the event based features to deal with the stereo problem the team investigated first how to represent the event-based visual information to be fed to the deep neural networks. The team developed a series of neural networks for object detection of the lane markings. Not only the networks could identify the type of object (object classification), but more importantly gain real-time data about the localization of the given object itself (object localization). By collecting and labeling a large training dataset of objects, the team implemented various neuronal networks providing us the needed accuracy for safely maneuvering our car. The networks used pre-trained weights from an already existing network for speeding up the training process significantly. The employed methods tapped into transfer learning, before nVidia actually releasing the Transfer Learning Toolkit. The team employed pre-trained deep learning models such as ResNet-10, ResNet-18, ResNet-50, GoogLeNet, VGG-16 and VGG-19 as a basis for adapting to their custom dataset. These networks could incrementally retrain the model with the help of the Transfer Learning for both object detection and image classification use cases. The TitanX GPU granted through the Nvidia GPU Grant program allowed the team to train the deep networks.

Development stages

In the first development phases, the team used nVidia’s Drivenet without modification. The input size with 1248 to 384 pixels was found suitable for our application. A screenshot with this image size used for training can be seen in the following.

The team was further able to optimize the network input size for the autonomous electric car application by adjusting the first layers of the Detectnet architecture and thus using the whole camera image width of 1900 pixels. While this step lowered the computational performance of inference at first, the team made great improvements in detection speed using a TensorRT optimized version of the net. The comparison of different network architectures vs. their computational performance can be seen in the following. Measurements were taken on a Nvidia Drive PX2 platform.

 

 

 

 

 

 

 

 

 

 

 

Using the predicted bounding boxes of this network, the Nvidia Driveworks API and a triangulation based localization approach, the team was able to predict 2D position of the road markings with +- 5% accuracy. To further increase the system’s debugging possibilities and gain experience with different neural network models, the team took a darknet approach into consideration. More precisely, the team made experiments with yoloV3 as well as tiny yoloV3, which allowed for an easy implementation into the existing ROS environment. Yolo allows straightforward adaptation of network architecture. Furthermore, varying input image sizes can be used without the need to retrain the whole network, which makes it very flexible for ongoing development. The used yoloV3 network architecture is shown in the following diagram.

 

 

 

 

 

 

 

 

 

 

 

Yolo utilizes the great power of CUDA and CUDNN to speed up training and inference, but is also highly portable to various systems using GPU or CPU resources. Again, the team is very thankful given the TitanX GPU which allowed us to execute the network training and inference testing. The team is currently experimenting with the amount and design of convolution and pooling layers. As the objects to detect are rather simple, it should be sufficient to use a small amount of convolutional filters opposing to the original yolonet architecture. Wile the original yoloV3 uses 173.588 Bflops per network execution, reducing the amount of layers results in 148.475 respectively 21.883 Bflops per iteration. These networks are not fully trained yet, but promise to deliver satisfying accuracy with much less inference time.

Next steps

In the next steps, having classification and recognition tackled, the team will focus on designing a network architecture able to take into account temporal aspects of the stereo-events from the neuromorphic camera in addition to existing physical constraints such as cross-disparity uniqueness and within-disparity smoothness. The network’s input includes the retinal location of a single event (pixel coordinates) and the time at which it has been detected. Consequently such non-linear processing, with good inference timing, offered by the yoloV3 and Detectnet can further facilitate a mechanism to extract a global spatio-temporal correlation for input events. Knowing the disparity of a moving object in the car’s vision sensors’ field of view, the team will further analyse the accuracy and the detection rate of the proposed algorithm.

Preliminary results ( July 2018)

The initial step was carried in training a single shot detector (mobilenet) for the cone detection, a stage in the preparation the Formula Electric competition. Experiments were carried on an nVidia GTX1080Ti GPU using TensorRT. The performance evaluation is shown in the following diagrams.

 

 

Loading new posts...
No more posts