International Science Index
Improved Rare Species Identification Using Focal Loss Based Deep Learning Models
The use of deep learning for species identification in camera trap images has revolutionised our ability to study, conserve and monitor species in a highly efficient and unobtrusive manner, with state-of-the-art models achieving accuracies surpassing the accuracy of manual human classification. The high imbalance of camera trap datasets, however, results in poor accuracies for minority (rare or endangered) species due to their relative insignificance to the overall model accuracy. This paper investigates the use of Focal Loss, in comparison to the traditional Cross Entropy Loss function, to improve the identification of minority species in the “255 Bird Species” dataset from Kaggle. The results show that, although Focal Loss slightly decreased the accuracy of the majority species, it was able to increase the F1-score by 0.06 and improve the identification of the bottom two, five and ten (minority) species by 37.5%, 15.7% and 10.8%, respectively, as well as resulting in an improved overall accuracy of 2.96%.
Deep Learning Based 6D Pose Estimation for Bin-Picking Using 3D Point Clouds
Estimating the 6D pose of objects is a core step for robot bin-picking tasks. The problem is that various objects are usually randomly stacked with heavy occlusion in real applications. In this work, we propose a method to regress 6D poses by predicting three points for each object in the 3D point cloud through deep learning. To solve the ambiguity of symmetric pose, we propose a labeling method to help the network converge better. Based on the predicted pose, an iterative method is employed for pose optimization. In real-world experiments, our method outperforms the classical approach in both precision and recall.
A Survey of Response Generation of Dialogue Systems
An essential task in the field of artificial intelligence is
to allow computers to interact with people through natural language.
Therefore, researches such as virtual assistants and dialogue systems
have received widespread attention from industry and academia. The
response generation plays a crucial role in dialogue systems, so to
push forward the research on this topic, this paper surveys various
methods for response generation. We sort out these methods into
three categories. First one includes finite state machine methods,
framework methods, and instance methods. The second contains
full-text indexing methods, ontology methods, vast knowledge base
method, and some other methods. The third covers retrieval methods
and generative methods. We also discuss some hybrid methods based
knowledge and deep learning. We compare their disadvantages and
advantages and point out in which ways these studies can be improved
further. Our discussion covers some studies published in leading
conferences such as IJCAI and AAAI in recent years.
A Survey of Sentiment Analysis Based on Deep Learning
Sentiment analysis is a very active research topic.
Every day, Facebook, Twitter, Weibo, and other social media,
as well as significant e-commerce websites, generate a massive
amount of comments, which can be used to analyse peoples
opinions or emotions. The existing methods for sentiment analysis
are based mainly on sentiment dictionaries, machine learning, and
deep learning. The first two kinds of methods rely on heavily
sentiment dictionaries or large amounts of labelled data. The third
one overcomes these two problems. So, in this paper, we focus
on the third one. Specifically, we survey various sentiment analysis
methods based on convolutional neural network, recurrent neural
network, long short-term memory, deep neural network, deep belief
network, and memory network. We compare their futures, advantages,
and disadvantages. Also, we point out the main problems of
these methods, which may be worthy of careful studies in the
future. Finally, we also examine the application of deep learning in
multimodal sentiment analysis and aspect-level sentiment analysis.
Embedded Semantic Segmentation Network Optimized for Matrix Multiplication Accelerator
Autonomous driving systems require high reliability to provide people with a safe and comfortable driving experience. However, despite the development of a number of vehicle sensors, it is difficult to always provide high perceived performance in driving environments that vary from time to season. The image segmentation method using deep learning, which has recently evolved rapidly, provides high recognition performance in various road environments stably. However, since the system controls a vehicle in real time, a highly complex deep learning network cannot be used due to time and memory constraints. Moreover, efficient networks are optimized for GPU environments, which degrade performance in embedded processor environments equipped simple hardware accelerators. In this paper, a semantic segmentation network, matrix multiplication accelerator network (MMANet), optimized for matrix multiplication accelerator (MMA) on Texas instrument digital signal processors (TI DSP) is proposed to improve the recognition performance of autonomous driving system. The proposed method is designed to maximize the number of layers that can be performed in a limited time to provide reliable driving environment information in real time. First, the number of channels in the activation map is fixed to fit the structure of MMA. By increasing the number of parallel branches, the lack of information caused by fixing the number of channels is resolved. Second, an efficient convolution is selected depending on the size of the activation. Since MMA is a fixed, it may be more efficient for normal convolution than depthwise separable convolution depending on memory access overhead. Thus, a convolution type is decided according to output stride to increase network depth. In addition, memory access time is minimized by processing operations only in L3 cache. Lastly, reliable contexts are extracted using the extended atrous spatial pyramid pooling (ASPP). The suggested method gets stable features from an extended path by increasing the kernel size and accessing consecutive data. In addition, it consists of two ASPPs to obtain high quality contexts using the restored shape without global average pooling paths since the layer uses MMA as a simple adder. To verify the proposed method, an experiment is conducted using perfsim, a timing simulator, and the Cityscapes validation sets. The proposed network can process an image with 640 x 480 resolution for 6.67 ms, so six cameras can be used to identify the surroundings of the vehicle as 20 frame per second (FPS). In addition, it achieves 73.1% mean intersection over union (mIoU) which is the highest recognition rate among embedded networks on the Cityscapes validation set.
On Dialogue Systems Based on Deep Learning
Nowadays, dialogue systems increasingly become the
way for humans to access many computer systems. So, humans
can interact with computers in natural language. A dialogue
system consists of three parts: understanding what humans say in
natural language, managing dialogue, and generating responses in
natural language. In this paper, we survey deep learning based
methods for dialogue management, response generation and dialogue
evaluation. Specifically, these methods are based on neural network,
long short-term memory network, deep reinforcement learning,
pre-training and generative adversarial network. We compare these
methods and point out the further research directions.
A Survey of Field Programmable Gate Array-Based Convolutional Neural Network Accelerators
With the rapid development of deep learning, neural network and deep learning algorithms play a significant role in various practical applications. Due to the high accuracy and good performance, Convolutional Neural Networks (CNNs) especially have become a research hot spot in the past few years. However, the size of the networks becomes increasingly large scale due to the demands of the practical applications, which poses a significant challenge to construct a high-performance implementation of deep learning neural networks. Meanwhile, many of these application scenarios also have strict requirements on the performance and low-power consumption of hardware devices. Therefore, it is particularly critical to choose a moderate computing platform for hardware acceleration of CNNs. This article aimed to survey the recent advance in Field Programmable Gate Array (FPGA)-based acceleration of CNNs. Various designs and implementations of the accelerator based on FPGA under different devices and network models are overviewed, and the versions of Graphic Processing Units (GPUs), Application Specific Integrated Circuits (ASICs) and Digital Signal Processors (DSPs) are compared to present our own critical analysis and comments. Finally, we give a discussion on different perspectives of these acceleration and optimization methods on FPGA platforms to further explore the opportunities and challenges for future research. More helpfully, we give a prospect for future development of the FPGA-based accelerator.
SNR Classification Using Multiple CNNs
Noise estimation is essential in today wireless systems
for power control, adaptive modulation, interference suppression and
quality of service. Deep learning (DL) has already been applied in the
physical layer for modulation and signal classifications. Unacceptably
low accuracy of less than 50% is found to undermine traditional
application of DL classification for SNR prediction. In this paper,
we use divide-and-conquer algorithm and classifier fusion method
to simplify SNR classification and therefore enhances DL learning
and prediction. Specifically, multiple CNNs are used for classification
rather than a single CNN. Each CNN performs a binary classification
of a single SNR with two labels: less than, greater than or equal.
Together, multiple CNNs are combined to effectively classify over a
range of SNR values from −20 ≤ SNR ≤ 32 dB.We use pre-trained
CNNs to predict SNR over a wide range of joint channel parameters
including multiple Doppler shifts (0, 60, 120 Hz), power-delay
profiles, and signal-modulation types (QPSK,16QAM,64-QAM). The
approach achieves individual SNR prediction accuracy of 92%,
composite accuracy of 70% and prediction convergence one order
of magnitude faster than that of traditional estimation.
A Deep-Learning Based Prediction of Pancreatic Adenocarcinoma with Electronic Health Records from the State of Maine
Predicting the risk of Pancreatic Adenocarcinoma (PA) in advance can benefit the quality of care and potentially reduce population mortality and morbidity. The aim of this study was to develop and prospectively validate a risk prediction model to identify patients at risk of new incident PA as early as 3 months before the onset of PA in a statewide, general population in Maine. The PA prediction model was developed using Deep Neural Networks, a deep learning algorithm, with a 2-year electronic-health-record (EHR) cohort. Prospective results showed that our model identified 54.35% of all inpatient episodes of PA, and 91.20% of all PA that required subsequent chemoradiotherapy, with a lead-time of up to 3 months and a true alert of 67.62%. The risk assessment tool has attained an improved discriminative ability. It can be immediately deployed to the health system to provide automatic early warnings to adults at risk of PA. It has potential to identify personalized risk factors to facilitate customized PA interventions.
Churn Prediction for Telecommunication Industry Using Artificial Neural Networks
Telecommunication service providers demand accurate
and precise prediction of customer churn probabilities to increase the
effectiveness of their customer relation services. The large amount of
customer data owned by the service providers is suitable for analysis
by machine learning methods. In this study, expenditure data of
customers are analyzed by using an artificial neural network (ANN).
The ANN model is applied to the data of customers with different
billing duration. The proposed model successfully predicts the churn
probabilities at 83% accuracy for only three months expenditure data
and the prediction accuracy increases up to 89% when the nine month
data is used. The experiments also show that the accuracy of ANN
model increases on an extended feature set with information of the
changes on the bill amounts.
Low-Cost Mechatronic Design of an Omnidirectional Mobile Robot
This paper presents the results of a mechatronic design based on a 4-wheel omnidirectional mobile robot that can be used in indoor logistic applications. The low-level control has been selected using two open-source hardware (Raspberry Pi 3 Model B+ and Arduino Mega 2560) that control four industrial motors, four ultrasound sensors, four optical encoders, a vision system of two cameras, and a Hokuyo URG-04LX-UG01 laser scanner. Moreover, the system is powered with a lithium battery that can supply 24 V DC and a maximum current-hour of 20Ah.The Robot Operating System (ROS) has been implemented in the Raspberry Pi and the performance is evaluated with the selection of the sensors and hardware selected. The mechatronic system is evaluated and proposed safe modes of power distribution for controlling all the electronic devices based on different tests. Therefore, based on different performance results, some recommendations are indicated for using the Raspberry Pi and Arduino in terms of power, communication, and distribution of control for different devices. According to these recommendations, the selection of sensors is distributed in both real-time controllers (Arduino and Raspberry Pi). On the other hand, the drivers of the cameras have been implemented in Linux and a python program has been implemented to access the cameras. These cameras will be used for implementing a deep learning algorithm to recognize people and objects. In this way, the level of intelligence can be increased in combination with the maps that can be obtained from the laser scanner.
Personal Information Classification Based on Deep Learning in Automatic Form Filling System
Recently, the rapid development of deep learning makes
artificial intelligence (AI) penetrate into many fields, replacing
manual work there. In particular, AI systems also become a research
focus in the field of automatic office. To meet real needs in automatic
officiating, in this paper we develop an automatic form filling system.
Specifically, it uses two classical neural network models and several
word embedding models to classify various relevant information
elicited from the Internet. When training the neural network models,
we use less noisy and balanced data for training. We conduct a series
of experiments to test my systems and the results show that our
system can achieve better classification results.
Automatic Product Identification Based on Deep-Learning Theory in an Assembly Line
Automated object recognition and identification systems
are widely used throughout the world, particularly in assembly lines,
where they perform quality control and automatic part selection tasks.
This article presents the design and implementation of an object
recognition system in an assembly line. The proposed shapes-color
recognition system is based on deep learning theory in a specially
designed convolutional network architecture. The used methodology
involve stages such as: image capturing, color filtering, location
of object mass centers, horizontal and vertical object boundaries,
and object clipping. Once the objects are cut out, they are sent to
a convolutional neural network, which automatically identifies the
type of figure. The identification system works in real-time. The
implementation was done on a Raspberry Pi 3 system and on a
Jetson-Nano device. The proposal is used in an assembly course
of bachelor’s degree in industrial engineering. The results presented
include studying the efficiency of the recognition and processing time.
Deep Learning Based, End-to-End Metaphor Detection in Greek with Recurrent and Convolutional Neural Networks
This paper presents and benchmarks a number of
end-to-end Deep Learning based models for metaphor detection in
Greek. We combine Convolutional Neural Networks and Recurrent
Neural Networks with representation learning to bear on the metaphor
detection problem for the Greek language. The models presented
achieve exceptional accuracy scores, significantly improving the
previous state-of-the-art results, which had already achieved accuracy
0.82. Furthermore, no special preprocessing, feature engineering or
linguistic knowledge is used in this work. The methods presented
achieve accuracy of 0.92 and F-score 0.92 with Convolutional
Neural Networks (CNNs) and bidirectional Long Short Term Memory
networks (LSTMs). Comparable results of 0.91 accuracy and 0.91
F-score are also achieved with bidirectional Gated Recurrent Units
(GRUs) and Convolutional Recurrent Neural Nets (CRNNs). The
models are trained and evaluated only on the basis of training tuples,
the related sentences and their labels. The outcome is a state-of-the-art
collection of metaphor detection models, trained on limited labelled
resources, which can be extended to other languages and similar
Deep Learning Application for Object Image Recognition and Robot Automatic Grasping
Since the vision system application in industrial environment for autonomous purposes is required intensely, the image recognition technique becomes an important research topic. Here, deep learning algorithm is employed in image system to recognize the industrial object and integrate with a 7A6 Series Manipulator for object automatic gripping task. PC and Graphic Processing Unit (GPU) are chosen to construct the 3D Vision Recognition System. Depth Camera (Intel RealSense SR300) is employed to extract the image for object recognition and coordinate derivation. The YOLOv2 scheme is adopted in Convolution neural network (CNN) structure for object classification and center point prediction. Additionally, image processing strategy is used to find the object contour for calculating the object orientation angle. Then, the specified object location and orientation information are sent to robotic controller. Finally, a six-axis manipulator can grasp the specific object in a random environment based on the user command and the extracted image information. The experimental results show that YOLOv2 has been successfully employed to detect the object location and category with confidence near 0.9 and 3D position error less than 0.4 mm. It is useful for future intelligent robotic application in industrial 4.0 environment.
NANCY: Combining Adversarial Networks with Cycle-Consistency for Robust Multi-Modal Image Registration
Multimodal image registration is a profoundly complex
task which is why deep learning has been used widely to address it in
recent years. However, two main challenges remain: Firstly, the lack
of ground truth data calls for an unsupervised learning approach,
which leads to the second challenge of defining a feasible loss
function that can compare two images of different modalities to judge
their level of alignment. To avoid this issue altogether we implement a
generative adversarial network consisting of two registration networks
GAB, GBA and two discrimination networks DA, DB connected by
spatial transformation layers. GAB learns to generate a deformation
field which registers an image of the modality B to an image of the
modality A. To do that, it uses the feedback of the discriminator DB
which is learning to judge the quality of alignment of the registered
image B. GBA and DA learn a mapping from modality A to modality
B. Additionally, a cycle-consistency loss is implemented. For this,
both registration networks are employed twice, therefore resulting in
images ˆA, ˆB which were registered to ˜B, ˜A which were registered
to the initial image pair A, B. Thus the resulting and initial images
of the same modality can be easily compared. A dataset of liver
CT and MRI was used to evaluate the quality of our approach and
to compare it against learning and non-learning based registration
algorithms. Our approach leads to dice scores of up to 0.80 ± 0.01
and is therefore comparable to and slightly more successful than
algorithms like SimpleElastix and VoxelMorph.
Convergence Analysis of Training Two-Hidden-Layer Partially Over-Parameterized ReLU Networks via Gradient Descent
Over-parameterized neural networks have attracted a
great deal of attention in recent deep learning theory research,
as they challenge the classic perspective of over-fitting when
the model has excessive parameters and have gained empirical
success in various settings. While a number of theoretical works
have been presented to demystify properties of such models, the
convergence properties of such models are still far from being
thoroughly understood. In this work, we study the convergence
properties of training two-hidden-layer partially over-parameterized
fully connected networks with the Rectified Linear Unit activation via
gradient descent. To our knowledge, this is the first theoretical work
to understand convergence properties of deep over-parameterized
networks without the equally-wide-hidden-layer assumption and
other unrealistic assumptions. We provide a probabilistic lower bound
of the widths of hidden layers and proved linear convergence rate of
gradient descent. We also conducted experiments on synthetic and
real-world datasets to validate our theory.
Research on Reservoir Lithology Prediction Based on Residual Neural Network and Squeeze-and- Excitation Neural Network
Conventional reservoir prediction methods ar not sufficient to explore the implicit relation between seismic attributes, and thus data utilization is low. In order to improve the predictive classification accuracy of reservoir lithology, this paper proposes a deep learning lithology prediction method based on ResNet (Residual Neural Network) and SENet (Squeeze-and-Excitation Neural Network). The neural network model is built and trained by using seismic attribute data and lithology data of Shengli oilfield, and the nonlinear mapping relationship between seismic attribute and lithology marker is established. The experimental results show that this method can significantly improve the classification effect of reservoir lithology, and the classification accuracy is close to 70%. This study can effectively predict the lithology of undrilled area and provide support for exploration and development.
Modeling Engagement with Multimodal Multisensor Data: The Continuous Performance Test as an Objective Tool to Track Flow
Engagement is one of the most important factors in determining successful outcomes and deep learning in students. Existing approaches to detect student engagement involve periodic human observations that are subject to inter-rater reliability. Our solution uses real-time multimodal multisensor data labeled by objective performance outcomes to infer the engagement of students. The study involves four students with a combined diagnosis of cerebral palsy and a learning disability who took part in a 3-month trial over 59 sessions. Multimodal multisensor data were collected while they participated in a continuous performance test. Eye gaze, electroencephalogram, body pose, and interaction data were used to create a model of student engagement through objective labeling from the continuous performance test outcomes. In order to achieve this, a type of continuous performance test is introduced, the Seek-X type. Nine features were extracted including high-level handpicked compound features. Using leave-one-out cross-validation, a series of different machine learning approaches were evaluated. Overall, the random forest classification approach achieved the best classification results. Using random forest, 93.3% classification for engagement and 42.9% accuracy for disengagement were achieved. We compared these results to outcomes from different models: AdaBoost, decision tree, k-Nearest Neighbor, naïve Bayes, neural network, and support vector machine. We showed that using a multisensor approach achieved higher accuracy than using features from any reduced set of sensors. We found that using high-level handpicked features can improve the classification accuracy in every sensor mode. Our approach is robust to both sensor fallout and occlusions. The single most important sensor feature to the classification of engagement and distraction was shown to be eye gaze. It has been shown that we can accurately predict the level of engagement of students with learning disabilities in a real-time approach that is not subject to inter-rater reliability, human observation or reliant on a single mode of sensor input. This will help teachers design interventions for a heterogeneous group of students, where teachers cannot possibly attend to each of their individual needs. Our approach can be used to identify those with the greatest learning challenges so that all students are supported to reach their full potential.
Affective computing in education
, affect detection
, continuous performance test
, learning disabilities
, machine learning
, physiological sensors
, Signal Detection Theory
, student engagement.
Automatic Number Plate Recognition System Based on Deep Learning
In the last few years, Automatic Number Plate Recognition (ANPR) systems have become widely used in the safety, the security, and the commercial aspects. Forethought, several methods and techniques are computing to achieve the better levels in terms of accuracy and real time execution. This paper proposed a computer vision algorithm of Number Plate Localization (NPL) and Characters Segmentation (CS). In addition, it proposed an improved method in Optical Character Recognition (OCR) based on Deep Learning (DL) techniques. In order to identify the number of detected plate after NPL and CS steps, the Convolutional Neural Network (CNN) algorithm is proposed. A DL model is developed using four convolution layers, two layers of Maxpooling, and six layers of fully connected. The model was trained by number image database on the Jetson TX2 NVIDIA target. The accuracy result has achieved 95.84%.
A Hybrid Feature Selection and Deep Learning Algorithm for Cancer Disease Classification
Learning from very big datasets is a significant problem for most present data mining and machine learning algorithms. MicroRNA (miRNA) is one of the important big genomic and non-coding datasets presenting the genome sequences. In this paper, a hybrid method for the classification of the miRNA data is proposed. Due to the variety of cancers and high number of genes, analyzing the miRNA dataset has been a challenging problem for researchers. The number of features corresponding to the number of samples is high and the data suffer from being imbalanced. The feature selection method has been used to select features having more ability to distinguish classes and eliminating obscures features. Afterward, a Convolutional Neural Network (CNN) classifier for classification of cancer types is utilized, which employs a Genetic Algorithm to highlight optimized hyper-parameters of CNN. In order to make the process of classification by CNN faster, Graphics Processing Unit (GPU) is recommended for calculating the mathematic equation in a parallel way. The proposed method is tested on a real-world dataset with 8,129 patients, 29 different types of tumors, and 1,046 miRNA biomarkers, taken from The Cancer Genome Atlas (TCGA) database.
Author Profiling: Prediction of Learners’ Gender on a MOOC Platform Based on Learners’ Comments
The more an educational system knows about a learner, the more personalised interaction it can provide, which leads to better learning. However, asking a learner directly is potentially disruptive, and often ignored by learners. Especially in the booming realm of MOOC Massive Online Learning platforms, only a very low percentage of users disclose demographic information about themselves. Thus, in this paper, we aim to predict learners’ demographic characteristics, by proposing an approach using linguistically motivated Deep Learning Architectures for Learner Profiling, particularly targeting gender prediction on a FutureLearn MOOC platform. Additionally, we tackle here the difficult problem of predicting the gender of learners based on their comments only – which are often available across MOOCs. The most common current approaches to text classification use the Long Short-Term Memory (LSTM) model, considering sentences as sequences. However, human language also has structures. In this research, rather than considering sentences as plain sequences, we hypothesise that higher semantic - and syntactic level sentence processing based on linguistics will render a richer representation. We thus evaluate, the traditional LSTM versus other bleeding edge models, which take into account syntactic structure, such as tree-structured LSTM, Stack-augmented Parser-Interpreter Neural Network (SPINN) and the Structure-Aware Tag Augmented model (SATA). Additionally, we explore using different word-level encoding functions. We have implemented these methods on Our MOOC dataset, which is the most performant one comparing with a public dataset on sentiment analysis that is further used as a cross-examining for the models' results.
The Layout Analysis of Handwriting Characters and the Fusion of Multi-style Ancient Books’ Background
Ancient books are signiﬁcant culture inheritors and their background textures convey the potential history information. However, multi-style texture recovery of ancient books has received little attention. Restricted by insufﬁcient ancient textures and complex handling process, the generation of ancient textures confronts with new challenges. For instance, training without sufficient data usually brings about overﬁtting or mode collapse, so some of the outputs are prone to be fake. Recently, image generation and style transfer based on deep learning are widely applied in computer vision. Breakthroughs within the ﬁeld make it possible to conduct research upon multi-style texture recovery of ancient books. Under the circumstances, we proposed a network of layout analysis and image fusion system. Firstly, we trained models by using Deep Convolution Generative against Networks (DCGAN) to synthesize multi-style ancient textures; then, we analyzed layouts based on the Position Rearrangement (PR) algorithm that we proposed to adjust the layout structure of foreground content; at last, we realized our goal by fusing rearranged foreground texts and generated background. In experiments, diversiﬁed samples such as ancient Yi, Jurchen, Seal were selected as our training sets. Then, the performances of different ﬁne-turning models were gradually improved by adjusting DCGAN model in parameters as well as structures. In order to evaluate the results scientiﬁcally, cross entropy loss function and Fréchet Inception Distance (FID) are selected to be our assessment criteria. Eventually, we got model M8 with lowest FID score. Compared with DCGAN model proposed by Radford at el., the FID score of M8 improved by 19.26%, enhancing the quality of the synthetic images profoundly.
A Recognition Method of Ancient Yi Script Based on Deep Learning
Yi is an ethnic group mainly living in mainland China, with its own spoken and written language systems, after development of thousands of years. Ancient Yi is one of the six ancient languages in the world, which keeps a record of the history of the Yi people and offers documents valuable for research into human civilization. Recognition of the characters in ancient Yi helps to transform the documents into an electronic form, making their storage and spreading convenient. Due to historical and regional limitations, research on recognition of ancient characters is still inadequate. Thus, deep learning technology was applied to the recognition of such characters. Five models were developed on the basis of the four-layer convolutional neural network (CNN). Alpha-Beta divergence was taken as a penalty term to re-encode output neurons of the five models. Two fully connected layers fulfilled the compression of the features. Finally, at the softmax layer, the orthographic features of ancient Yi characters were re-evaluated, their probability distributions were obtained, and characters with features of the highest probability were recognized. Tests conducted show that the method has achieved higher precision compared with the traditional CNN model for handwriting recognition of the ancient Yi.
Performance Evaluation of Distributed Deep Learning Frameworks in Cloud Environment
2016 has become the year of the Artificial Intelligence explosion. AI technologies are getting more and more matured that most world well-known tech giants are making large investment to increase the capabilities in AI. Machine learning is the science of getting computers to act without being explicitly programmed, and deep learning is a subset of machine learning that uses deep neural network to train a machine to learn features directly from data. Deep learning realizes many machine learning applications which expand the field of AI. At the present time, deep learning frameworks have been widely deployed on servers for deep learning applications in both academia and industry. In training deep neural networks, there are many standard processes or algorithms, but the performance of different frameworks might be different. In this paper we evaluate the running performance of two state-of-the-art distributed deep learning frameworks that are running training calculation in parallel over multi GPU and multi nodes in our cloud environment. We evaluate the training performance of the frameworks with ResNet-50 convolutional neural network, and we analyze what factors that result in the performance among both distributed frameworks as well. Through the experimental analysis, we identify the overheads which could be further optimized. The main contribution is that the evaluation results provide further optimization directions in both performance tuning and algorithmic design.
Foot Recognition Using Deep Learning for Knee Rehabilitation
The use of foot recognition can be applied in many medical fields such as the gait pattern analysis and the knee exercises of patients in rehabilitation. Generally, a camera-based foot recognition system is intended to capture a patient image in a controlled room and background to recognize the foot in the limited views. However, this system can be inconvenient to monitor the knee exercises at home. In order to overcome these problems, this paper proposes to use the deep learning method using Convolutional Neural Networks (CNNs) for foot recognition. The results are compared with the traditional classification method using LBP and HOG features with kNN and SVM classifiers. According to the results, deep learning method provides better accuracy but with higher complexity to recognize the foot images from online databases than the traditional classification method.
Classification Based on Deep Neural Cellular Automata Model
Deep learning structure is a branch of machine learning science and greet achievement in research and applications. Cellular neural networks are regarded as array of nonlinear analog processors called cells connected in a way allowing parallel computations. The paper discusses how to use deep learning structure for representing neural cellular automata model. The proposed learning technique in cellular automata model will be examined from structure of deep learning. A deep automata neural cellular system modifies each neuron based on the behavior of the individual and its decision as a result of multi-level deep structure learning. The paper will present the architecture of the model and the results of simulation of approach are given. Results from the implementation enrich deep neural cellular automata system and shed a light on concept formulation of the model and the learning in it.
Single-Camera Basketball Tracker through Pose and Semantic Feature Fusion
Tracking sports players is a widely challenging
scenario, specially in single-feed videos recorded in tight courts,
where cluttering and occlusions cannot be avoided. This paper
presents an analysis of several geometric and semantic visual features
to detect and track basketball players. An ablation study is carried
out and then used to remark that a robust tracker can be built with
Deep Learning features, without the need of extracting contextual
ones, such as proximity or color similarity, nor applying camera
stabilization techniques. The presented tracker consists of: (1) a
detection step, which uses a pretrained deep learning model to
estimate the players pose, followed by (2) a tracking step, which
leverages pose and semantic information from the output of a
convolutional layer in a VGG network. Its performance is analyzed
in terms of MOTA over a basketball dataset with more than 10k
Deep Learning Based Fall Detection Using Simplified Human Posture
Falls are one of the major causes of injury and death
among elderly people aged 65 and above. A support system to
identify such kind of abnormal activities have become extremely
important with the increase in ageing population. Pose estimation
is a challenging task and to add more to this, it is even more
challenging when pose estimations are performed on challenging
poses that may occur during fall. Location of the body provides a
clue where the person is at the time of fall. This paper presents
a vision-based tracking strategy where available joints are grouped
into three different feature points depending upon the section they are
located in the body. The three feature points derived from different
joints combinations represents the upper region or head region,
mid-region or torso and lower region or leg region. Tracking is always
challenging when a motion is involved. Hence the idea is to locate
the regions in the body in every frame and consider it as the tracking
strategy. Grouping these joints can be beneficial to achieve a stable
region for tracking. The location of the body parts provides a crucial
information to distinguish normal activities from falls.
Vision-Based Collision Avoidance for Unmanned Aerial Vehicles by Recurrent Neural Networks
Due to the sensor technology, video surveillance has become the main way for security control in every big city in the world. Surveillance is usually used by governments for intelligence gathering, the prevention of crime, the protection of a process, person, group or object, or the investigation of crime. Many surveillance systems based on computer vision technology have been developed in recent years. Moving target tracking is the most common task for Unmanned Aerial Vehicle (UAV) to find and track objects of interest in mobile aerial surveillance for civilian applications. The paper is focused on vision-based collision avoidance for UAVs by recurrent neural networks. First, images from cameras on UAV were fused based on deep convolutional neural network. Then, a recurrent neural network was constructed to obtain high-level image features for object tracking and extracting low-level image features for noise reducing. The system distributed the calculation of the whole system to local and cloud platform to efficiently perform object detection, tracking and collision avoidance based on multiple UAVs. The experiments on several challenging datasets showed that the proposed algorithm outperforms the state-of-the-art methods.