International Science Index
Improving the Performance of Deep Learning in Facial Emotion Recognition with Image Sharpening
We as humans use words with accompanying visual and facial cues to communicate effectively. Classifying facial emotion using computer vision methodologies has been an active research area in the computer vision field. In this paper, we propose a simple method for facial expression recognition that enhances accuracy. We tested our method on the FER-2013 dataset that contains static images. Instead of using Histogram equalization to preprocess the dataset, we used Unsharp Mask to emphasize texture and details and sharpened the edges. We also used ImageDataGenerator from Keras library for data augmentation. Then we used Convolutional Neural Networks (CNN) model to classify the images into 7 different facial expressions, yielding an accuracy of 69.46% on the test set. Our results show that using image preprocessing such as the sharpening technique for a CNN model can improve the performance, even when the CNN model is relatively simple.
Image Ranking to Assist Object Labeling for Training Detection Models
Training a machine learning model for object detection
that generalizes well is known to benefit from a training dataset
with diverse examples. However, training datasets usually contain
many repeats of common examples of a class and lack rarely seen
examples. This is due to the process commonly used during human
annotation where a person would proceed sequentially through a
list of images labeling a sufficiently high total number of examples.
Instead, the method presented involves an active process where, after
the initial labeling of several images is completed, the next subset
of images for labeling is selected by an algorithm. This process of
algorithmic image selection and manual labeling continues in an
iterative fashion. The algorithm used for the image selection is a
deep learning algorithm, based on the U-shaped architecture, which
quantifies the presence of unseen data in each image in order to find
images that contain the most novel examples. Moreover, the location
of the unseen data in each image is highlighted, aiding the labeler in
spotting these examples. Experiments performed using semiconductor
wafer data show that labeling a subset of the data, curated by this
algorithm, resulted in a model with a better performance than a
model produced from sequentially labeling the same amount of data.
Also, similar performance is achieved compared to a model trained
on exhaustive labeling of the whole dataset. Overall, the proposed
approach results in a dataset that has a diverse set of examples per
class as well as more balanced classes, which proves beneficial when
training a deep learning model.
Bayesian Deep Learning Algorithms for Classifying COVID-19 Images
The study investigates the accuracy and loss of deep learning algorithms with the set of coronavirus (COVID-19) images dataset by comparing Bayesian convolutional neural network and traditional convolutional neural network in low dimensional dataset. 50 sets of X-ray images out of which 25 were COVID-19 and the remaining 20 were normal, twenty images were set as training while five were set as validation that were used to ascertained the accuracy of the model. The study found out that Bayesian convolution neural network outperformed conventional neural network at low dimensional dataset that could have exhibited under fitting. The study therefore recommended Bayesian Convolutional neural network (BCNN) for android apps in computer vision for image detection.
A Context-Centric Chatbot for Cryptocurrency Using the Bidirectional Encoder Representations from Transformers Neural Networks
Inspired by the recent movement of digital currency,
we are building a question answering system concerning the subject
of cryptocurrency using Bidirectional Encoder Representations from
Transformers (BERT). The motivation behind this work is to
properly assist digital currency investors by directing them to
the corresponding knowledge bases that can offer them help and
increase the querying speed. BERT, one of newest language models
in natural language processing, was investigated to improve the
quality of generated responses. We studied different combinations of
hyperparameters of the BERT model to obtain the best fit responses.
Further, we created an intelligent chatbot for cryptocurrency using
BERT. A chatbot using BERT shows great potential for the further
advancement of a cryptocurrency market tool. We show that the
BERT neural networks generalize well to other tasks by applying
it successfully to cryptocurrency.
Malaria Parasite Detection Using Deep Learning Methods
Malaria is a serious disease which affects hundreds of
millions of people around the world, each year. If not treated in time,
it can be fatal. Despite recent developments in malaria diagnostics,
the microscopy method to detect malaria remains the most common.
Unfortunately, the accuracy of microscopic diagnostics is dependent
on the skill of the microscopist and limits the throughput of malaria
diagnosis. With the development of Artificial Intelligence tools and
Deep Learning techniques in particular, it is possible to lower the cost,
while achieving an overall higher accuracy. In this paper, we present a
VGG-based model and compare it with previously developed models
for identifying infected cells. Our model surpasses most previously
developed models in a range of the accuracy metrics. The model has
an advantage of being constructed from a relatively small number of
layers. This reduces the computer resources and computational time.
Moreover, we test our model on two types of datasets and argue
that the currently developed deep-learning-based methods cannot
efficiently distinguish between infected and contaminated cells. A
more precise study of suspicious regions is required.
Improved Rare Species Identification Using Focal Loss Based Deep Learning Models
The use of deep learning for species identification in camera trap images has revolutionised our ability to study, conserve and monitor species in a highly efficient and unobtrusive manner, with state-of-the-art models achieving accuracies surpassing the accuracy of manual human classification. The high imbalance of camera trap datasets, however, results in poor accuracies for minority (rare or endangered) species due to their relative insignificance to the overall model accuracy. This paper investigates the use of Focal Loss, in comparison to the traditional Cross Entropy Loss function, to improve the identification of minority species in the “255 Bird Species” dataset from Kaggle. The results show that, although Focal Loss slightly decreased the accuracy of the majority species, it was able to increase the F1-score by 0.06 and improve the identification of the bottom two, five and ten (minority) species by 37.5%, 15.7% and 10.8%, respectively, as well as resulting in an improved overall accuracy of 2.96%.
Deep Learning Based 6D Pose Estimation for Bin-Picking Using 3D Point Clouds
Estimating the 6D pose of objects is a core step for robot bin-picking tasks. The problem is that various objects are usually randomly stacked with heavy occlusion in real applications. In this work, we propose a method to regress 6D poses by predicting three points for each object in the 3D point cloud through deep learning. To solve the ambiguity of symmetric pose, we propose a labeling method to help the network converge better. Based on the predicted pose, an iterative method is employed for pose optimization. In real-world experiments, our method outperforms the classical approach in both precision and recall.
Facial Emotion Recognition with Convolutional Neural Network Based Architecture
Neural networks are appealing for many applications since they are able to learn complex non-linear relationships between input and output data. As the number of neurons and layers in a neural network increase, it is possible to represent more complex relationships with automatically extracted features. Nowadays Deep Neural Networks (DNNs) are widely used in Computer Vision problems such as; classification, object detection, segmentation image editing etc. In this work, Facial Emotion Recognition task is performed by proposed Convolutional Neural Network (CNN)-based DNN architecture using FER2013 Dataset. Moreover, the effects of different hyperparameters (activation function, kernel size, initializer, batch size and network size) are investigated and ablation study results for Pooling Layer, Dropout and Batch Normalization are presented.
Computer Countenanced Diagnosis of Skin Nodule Detection and Histogram Augmentation: Extracting System for Skin Cancer
Background: Skin cancer is now is the buzzing button in the field of medical science. The cyst's pandemic is drastically calibrating the body and well-being of the global village. Methods: The extracted image of the skin tumor cannot be used in one way for diagnosis. The stored image contains anarchies like the center. This approach will locate the forepart of an extracted appearance of skin. Partitioning image models has been presented to sort out the disturbance in the picture. Results: After completing partitioning, feature extraction has been formed by using genetic algorithm and finally, classification can be performed between the trained and test data to evaluate a large scale of an image that helps the doctors for the right prediction. To bring the improvisation of the existing system, we have set our objectives with an analysis. The efficiency of the natural selection process and the enriching histogram is essential in that respect. To reduce the false-positive rate or output, GA is performed with its accuracy. Conclusions: The objective of this task is to bring improvisation of effectiveness. GA is accomplishing its task with perfection to bring down the invalid-positive rate or outcome. The paper's mergeable portion conflicts with the composition of deep learning and medical image processing, which provides superior accuracy. Proportional types of handling create the reusability without any errors.
A Survey of Response Generation of Dialogue Systems
An essential task in the field of artificial intelligence is
to allow computers to interact with people through natural language.
Therefore, researches such as virtual assistants and dialogue systems
have received widespread attention from industry and academia. The
response generation plays a crucial role in dialogue systems, so to
push forward the research on this topic, this paper surveys various
methods for response generation. We sort out these methods into
three categories. First one includes finite state machine methods,
framework methods, and instance methods. The second contains
full-text indexing methods, ontology methods, vast knowledge base
method, and some other methods. The third covers retrieval methods
and generative methods. We also discuss some hybrid methods based
knowledge and deep learning. We compare their disadvantages and
advantages and point out in which ways these studies can be improved
further. Our discussion covers some studies published in leading
conferences such as IJCAI and AAAI in recent years.
A Survey of Sentiment Analysis Based on Deep Learning
Sentiment analysis is a very active research topic.
Every day, Facebook, Twitter, Weibo, and other social media,
as well as significant e-commerce websites, generate a massive
amount of comments, which can be used to analyse peoples
opinions or emotions. The existing methods for sentiment analysis
are based mainly on sentiment dictionaries, machine learning, and
deep learning. The first two kinds of methods rely on heavily
sentiment dictionaries or large amounts of labelled data. The third
one overcomes these two problems. So, in this paper, we focus
on the third one. Specifically, we survey various sentiment analysis
methods based on convolutional neural network, recurrent neural
network, long short-term memory, deep neural network, deep belief
network, and memory network. We compare their futures, advantages,
and disadvantages. Also, we point out the main problems of
these methods, which may be worthy of careful studies in the
future. Finally, we also examine the application of deep learning in
multimodal sentiment analysis and aspect-level sentiment analysis.
Embedded Semantic Segmentation Network Optimized for Matrix Multiplication Accelerator
Autonomous driving systems require high reliability to provide people with a safe and comfortable driving experience. However, despite the development of a number of vehicle sensors, it is difficult to always provide high perceived performance in driving environments that vary from time to season. The image segmentation method using deep learning, which has recently evolved rapidly, provides high recognition performance in various road environments stably. However, since the system controls a vehicle in real time, a highly complex deep learning network cannot be used due to time and memory constraints. Moreover, efficient networks are optimized for GPU environments, which degrade performance in embedded processor environments equipped simple hardware accelerators. In this paper, a semantic segmentation network, matrix multiplication accelerator network (MMANet), optimized for matrix multiplication accelerator (MMA) on Texas instrument digital signal processors (TI DSP) is proposed to improve the recognition performance of autonomous driving system. The proposed method is designed to maximize the number of layers that can be performed in a limited time to provide reliable driving environment information in real time. First, the number of channels in the activation map is fixed to fit the structure of MMA. By increasing the number of parallel branches, the lack of information caused by fixing the number of channels is resolved. Second, an efficient convolution is selected depending on the size of the activation. Since MMA is a fixed, it may be more efficient for normal convolution than depthwise separable convolution depending on memory access overhead. Thus, a convolution type is decided according to output stride to increase network depth. In addition, memory access time is minimized by processing operations only in L3 cache. Lastly, reliable contexts are extracted using the extended atrous spatial pyramid pooling (ASPP). The suggested method gets stable features from an extended path by increasing the kernel size and accessing consecutive data. In addition, it consists of two ASPPs to obtain high quality contexts using the restored shape without global average pooling paths since the layer uses MMA as a simple adder. To verify the proposed method, an experiment is conducted using perfsim, a timing simulator, and the Cityscapes validation sets. The proposed network can process an image with 640 x 480 resolution for 6.67 ms, so six cameras can be used to identify the surroundings of the vehicle as 20 frame per second (FPS). In addition, it achieves 73.1% mean intersection over union (mIoU) which is the highest recognition rate among embedded networks on the Cityscapes validation set.
On Dialogue Systems Based on Deep Learning
Nowadays, dialogue systems increasingly become the
way for humans to access many computer systems. So, humans
can interact with computers in natural language. A dialogue
system consists of three parts: understanding what humans say in
natural language, managing dialogue, and generating responses in
natural language. In this paper, we survey deep learning based
methods for dialogue management, response generation and dialogue
evaluation. Specifically, these methods are based on neural network,
long short-term memory network, deep reinforcement learning,
pre-training and generative adversarial network. We compare these
methods and point out the further research directions.
A Survey of Field Programmable Gate Array-Based Convolutional Neural Network Accelerators
With the rapid development of deep learning, neural network and deep learning algorithms play a significant role in various practical applications. Due to the high accuracy and good performance, Convolutional Neural Networks (CNNs) especially have become a research hot spot in the past few years. However, the size of the networks becomes increasingly large scale due to the demands of the practical applications, which poses a significant challenge to construct a high-performance implementation of deep learning neural networks. Meanwhile, many of these application scenarios also have strict requirements on the performance and low-power consumption of hardware devices. Therefore, it is particularly critical to choose a moderate computing platform for hardware acceleration of CNNs. This article aimed to survey the recent advance in Field Programmable Gate Array (FPGA)-based acceleration of CNNs. Various designs and implementations of the accelerator based on FPGA under different devices and network models are overviewed, and the versions of Graphic Processing Units (GPUs), Application Specific Integrated Circuits (ASICs) and Digital Signal Processors (DSPs) are compared to present our own critical analysis and comments. Finally, we give a discussion on different perspectives of these acceleration and optimization methods on FPGA platforms to further explore the opportunities and challenges for future research. More helpfully, we give a prospect for future development of the FPGA-based accelerator.
SNR Classification Using Multiple CNNs
Noise estimation is essential in today wireless systems
for power control, adaptive modulation, interference suppression and
quality of service. Deep learning (DL) has already been applied in the
physical layer for modulation and signal classifications. Unacceptably
low accuracy of less than 50% is found to undermine traditional
application of DL classification for SNR prediction. In this paper,
we use divide-and-conquer algorithm and classifier fusion method
to simplify SNR classification and therefore enhances DL learning
and prediction. Specifically, multiple CNNs are used for classification
rather than a single CNN. Each CNN performs a binary classification
of a single SNR with two labels: less than, greater than or equal.
Together, multiple CNNs are combined to effectively classify over a
range of SNR values from −20 ≤ SNR ≤ 32 dB.We use pre-trained
CNNs to predict SNR over a wide range of joint channel parameters
including multiple Doppler shifts (0, 60, 120 Hz), power-delay
profiles, and signal-modulation types (QPSK,16QAM,64-QAM). The
approach achieves individual SNR prediction accuracy of 92%,
composite accuracy of 70% and prediction convergence one order
of magnitude faster than that of traditional estimation.
A Deep-Learning Based Prediction of Pancreatic Adenocarcinoma with Electronic Health Records from the State of Maine
Predicting the risk of Pancreatic Adenocarcinoma (PA) in advance can benefit the quality of care and potentially reduce population mortality and morbidity. The aim of this study was to develop and prospectively validate a risk prediction model to identify patients at risk of new incident PA as early as 3 months before the onset of PA in a statewide, general population in Maine. The PA prediction model was developed using Deep Neural Networks, a deep learning algorithm, with a 2-year electronic-health-record (EHR) cohort. Prospective results showed that our model identified 54.35% of all inpatient episodes of PA, and 91.20% of all PA that required subsequent chemoradiotherapy, with a lead-time of up to 3 months and a true alert of 67.62%. The risk assessment tool has attained an improved discriminative ability. It can be immediately deployed to the health system to provide automatic early warnings to adults at risk of PA. It has potential to identify personalized risk factors to facilitate customized PA interventions.
Churn Prediction for Telecommunication Industry Using Artificial Neural Networks
Telecommunication service providers demand accurate
and precise prediction of customer churn probabilities to increase the
effectiveness of their customer relation services. The large amount of
customer data owned by the service providers is suitable for analysis
by machine learning methods. In this study, expenditure data of
customers are analyzed by using an artificial neural network (ANN).
The ANN model is applied to the data of customers with different
billing duration. The proposed model successfully predicts the churn
probabilities at 83% accuracy for only three months expenditure data
and the prediction accuracy increases up to 89% when the nine month
data is used. The experiments also show that the accuracy of ANN
model increases on an extended feature set with information of the
changes on the bill amounts.
Low-Cost Mechatronic Design of an Omnidirectional Mobile Robot
This paper presents the results of a mechatronic design based on a 4-wheel omnidirectional mobile robot that can be used in indoor logistic applications. The low-level control has been selected using two open-source hardware (Raspberry Pi 3 Model B+ and Arduino Mega 2560) that control four industrial motors, four ultrasound sensors, four optical encoders, a vision system of two cameras, and a Hokuyo URG-04LX-UG01 laser scanner. Moreover, the system is powered with a lithium battery that can supply 24 V DC and a maximum current-hour of 20Ah.The Robot Operating System (ROS) has been implemented in the Raspberry Pi and the performance is evaluated with the selection of the sensors and hardware selected. The mechatronic system is evaluated and proposed safe modes of power distribution for controlling all the electronic devices based on different tests. Therefore, based on different performance results, some recommendations are indicated for using the Raspberry Pi and Arduino in terms of power, communication, and distribution of control for different devices. According to these recommendations, the selection of sensors is distributed in both real-time controllers (Arduino and Raspberry Pi). On the other hand, the drivers of the cameras have been implemented in Linux and a python program has been implemented to access the cameras. These cameras will be used for implementing a deep learning algorithm to recognize people and objects. In this way, the level of intelligence can be increased in combination with the maps that can be obtained from the laser scanner.
Personal Information Classification Based on Deep Learning in Automatic Form Filling System
Recently, the rapid development of deep learning makes
artificial intelligence (AI) penetrate into many fields, replacing
manual work there. In particular, AI systems also become a research
focus in the field of automatic office. To meet real needs in automatic
officiating, in this paper we develop an automatic form filling system.
Specifically, it uses two classical neural network models and several
word embedding models to classify various relevant information
elicited from the Internet. When training the neural network models,
we use less noisy and balanced data for training. We conduct a series
of experiments to test my systems and the results show that our
system can achieve better classification results.
Automatic Product Identification Based on Deep-Learning Theory in an Assembly Line
Automated object recognition and identification systems
are widely used throughout the world, particularly in assembly lines,
where they perform quality control and automatic part selection tasks.
This article presents the design and implementation of an object
recognition system in an assembly line. The proposed shapes-color
recognition system is based on deep learning theory in a specially
designed convolutional network architecture. The used methodology
involve stages such as: image capturing, color filtering, location
of object mass centers, horizontal and vertical object boundaries,
and object clipping. Once the objects are cut out, they are sent to
a convolutional neural network, which automatically identifies the
type of figure. The identification system works in real-time. The
implementation was done on a Raspberry Pi 3 system and on a
Jetson-Nano device. The proposal is used in an assembly course
of bachelor’s degree in industrial engineering. The results presented
include studying the efficiency of the recognition and processing time.
Deep Learning Based, End-to-End Metaphor Detection in Greek with Recurrent and Convolutional Neural Networks
This paper presents and benchmarks a number of
end-to-end Deep Learning based models for metaphor detection in
Greek. We combine Convolutional Neural Networks and Recurrent
Neural Networks with representation learning to bear on the metaphor
detection problem for the Greek language. The models presented
achieve exceptional accuracy scores, significantly improving the
previous state-of-the-art results, which had already achieved accuracy
0.82. Furthermore, no special preprocessing, feature engineering or
linguistic knowledge is used in this work. The methods presented
achieve accuracy of 0.92 and F-score 0.92 with Convolutional
Neural Networks (CNNs) and bidirectional Long Short Term Memory
networks (LSTMs). Comparable results of 0.91 accuracy and 0.91
F-score are also achieved with bidirectional Gated Recurrent Units
(GRUs) and Convolutional Recurrent Neural Nets (CRNNs). The
models are trained and evaluated only on the basis of training tuples,
the related sentences and their labels. The outcome is a state-of-the-art
collection of metaphor detection models, trained on limited labelled
resources, which can be extended to other languages and similar
Deep Learning Application for Object Image Recognition and Robot Automatic Grasping
Since the vision system application in industrial environment for autonomous purposes is required intensely, the image recognition technique becomes an important research topic. Here, deep learning algorithm is employed in image system to recognize the industrial object and integrate with a 7A6 Series Manipulator for object automatic gripping task. PC and Graphic Processing Unit (GPU) are chosen to construct the 3D Vision Recognition System. Depth Camera (Intel RealSense SR300) is employed to extract the image for object recognition and coordinate derivation. The YOLOv2 scheme is adopted in Convolution neural network (CNN) structure for object classification and center point prediction. Additionally, image processing strategy is used to find the object contour for calculating the object orientation angle. Then, the specified object location and orientation information are sent to robotic controller. Finally, a six-axis manipulator can grasp the specific object in a random environment based on the user command and the extracted image information. The experimental results show that YOLOv2 has been successfully employed to detect the object location and category with confidence near 0.9 and 3D position error less than 0.4 mm. It is useful for future intelligent robotic application in industrial 4.0 environment.
NANCY: Combining Adversarial Networks with Cycle-Consistency for Robust Multi-Modal Image Registration
Multimodal image registration is a profoundly complex
task which is why deep learning has been used widely to address it in
recent years. However, two main challenges remain: Firstly, the lack
of ground truth data calls for an unsupervised learning approach,
which leads to the second challenge of defining a feasible loss
function that can compare two images of different modalities to judge
their level of alignment. To avoid this issue altogether we implement a
generative adversarial network consisting of two registration networks
GAB, GBA and two discrimination networks DA, DB connected by
spatial transformation layers. GAB learns to generate a deformation
field which registers an image of the modality B to an image of the
modality A. To do that, it uses the feedback of the discriminator DB
which is learning to judge the quality of alignment of the registered
image B. GBA and DA learn a mapping from modality A to modality
B. Additionally, a cycle-consistency loss is implemented. For this,
both registration networks are employed twice, therefore resulting in
images ˆA, ˆB which were registered to ˜B, ˜A which were registered
to the initial image pair A, B. Thus the resulting and initial images
of the same modality can be easily compared. A dataset of liver
CT and MRI was used to evaluate the quality of our approach and
to compare it against learning and non-learning based registration
algorithms. Our approach leads to dice scores of up to 0.80 ± 0.01
and is therefore comparable to and slightly more successful than
algorithms like SimpleElastix and VoxelMorph.
Convergence Analysis of Training Two-Hidden-Layer Partially Over-Parameterized ReLU Networks via Gradient Descent
Over-parameterized neural networks have attracted a
great deal of attention in recent deep learning theory research,
as they challenge the classic perspective of over-fitting when
the model has excessive parameters and have gained empirical
success in various settings. While a number of theoretical works
have been presented to demystify properties of such models, the
convergence properties of such models are still far from being
thoroughly understood. In this work, we study the convergence
properties of training two-hidden-layer partially over-parameterized
fully connected networks with the Rectified Linear Unit activation via
gradient descent. To our knowledge, this is the first theoretical work
to understand convergence properties of deep over-parameterized
networks without the equally-wide-hidden-layer assumption and
other unrealistic assumptions. We provide a probabilistic lower bound
of the widths of hidden layers and proved linear convergence rate of
gradient descent. We also conducted experiments on synthetic and
real-world datasets to validate our theory.
Research on Reservoir Lithology Prediction Based on Residual Neural Network and Squeeze-and- Excitation Neural Network
Conventional reservoir prediction methods ar not sufficient to explore the implicit relation between seismic attributes, and thus data utilization is low. In order to improve the predictive classification accuracy of reservoir lithology, this paper proposes a deep learning lithology prediction method based on ResNet (Residual Neural Network) and SENet (Squeeze-and-Excitation Neural Network). The neural network model is built and trained by using seismic attribute data and lithology data of Shengli oilfield, and the nonlinear mapping relationship between seismic attribute and lithology marker is established. The experimental results show that this method can significantly improve the classification effect of reservoir lithology, and the classification accuracy is close to 70%. This study can effectively predict the lithology of undrilled area and provide support for exploration and development.
Modeling Engagement with Multimodal Multisensor Data: The Continuous Performance Test as an Objective Tool to Track Flow
Engagement is one of the most important factors in determining successful outcomes and deep learning in students. Existing approaches to detect student engagement involve periodic human observations that are subject to inter-rater reliability. Our solution uses real-time multimodal multisensor data labeled by objective performance outcomes to infer the engagement of students. The study involves four students with a combined diagnosis of cerebral palsy and a learning disability who took part in a 3-month trial over 59 sessions. Multimodal multisensor data were collected while they participated in a continuous performance test. Eye gaze, electroencephalogram, body pose, and interaction data were used to create a model of student engagement through objective labeling from the continuous performance test outcomes. In order to achieve this, a type of continuous performance test is introduced, the Seek-X type. Nine features were extracted including high-level handpicked compound features. Using leave-one-out cross-validation, a series of different machine learning approaches were evaluated. Overall, the random forest classification approach achieved the best classification results. Using random forest, 93.3% classification for engagement and 42.9% accuracy for disengagement were achieved. We compared these results to outcomes from different models: AdaBoost, decision tree, k-Nearest Neighbor, naïve Bayes, neural network, and support vector machine. We showed that using a multisensor approach achieved higher accuracy than using features from any reduced set of sensors. We found that using high-level handpicked features can improve the classification accuracy in every sensor mode. Our approach is robust to both sensor fallout and occlusions. The single most important sensor feature to the classification of engagement and distraction was shown to be eye gaze. It has been shown that we can accurately predict the level of engagement of students with learning disabilities in a real-time approach that is not subject to inter-rater reliability, human observation or reliant on a single mode of sensor input. This will help teachers design interventions for a heterogeneous group of students, where teachers cannot possibly attend to each of their individual needs. Our approach can be used to identify those with the greatest learning challenges so that all students are supported to reach their full potential.
Affective computing in education
, affect detection
, continuous performance test
, learning disabilities
, machine learning
, physiological sensors
, Signal Detection Theory
, student engagement.
Automatic Number Plate Recognition System Based on Deep Learning
In the last few years, Automatic Number Plate Recognition (ANPR) systems have become widely used in the safety, the security, and the commercial aspects. Forethought, several methods and techniques are computing to achieve the better levels in terms of accuracy and real time execution. This paper proposed a computer vision algorithm of Number Plate Localization (NPL) and Characters Segmentation (CS). In addition, it proposed an improved method in Optical Character Recognition (OCR) based on Deep Learning (DL) techniques. In order to identify the number of detected plate after NPL and CS steps, the Convolutional Neural Network (CNN) algorithm is proposed. A DL model is developed using four convolution layers, two layers of Maxpooling, and six layers of fully connected. The model was trained by number image database on the Jetson TX2 NVIDIA target. The accuracy result has achieved 95.84%.
A Hybrid Feature Selection and Deep Learning Algorithm for Cancer Disease Classification
Learning from very big datasets is a significant problem for most present data mining and machine learning algorithms. MicroRNA (miRNA) is one of the important big genomic and non-coding datasets presenting the genome sequences. In this paper, a hybrid method for the classification of the miRNA data is proposed. Due to the variety of cancers and high number of genes, analyzing the miRNA dataset has been a challenging problem for researchers. The number of features corresponding to the number of samples is high and the data suffer from being imbalanced. The feature selection method has been used to select features having more ability to distinguish classes and eliminating obscures features. Afterward, a Convolutional Neural Network (CNN) classifier for classification of cancer types is utilized, which employs a Genetic Algorithm to highlight optimized hyper-parameters of CNN. In order to make the process of classification by CNN faster, Graphics Processing Unit (GPU) is recommended for calculating the mathematic equation in a parallel way. The proposed method is tested on a real-world dataset with 8,129 patients, 29 different types of tumors, and 1,046 miRNA biomarkers, taken from The Cancer Genome Atlas (TCGA) database.
Author Profiling: Prediction of Learners’ Gender on a MOOC Platform Based on Learners’ Comments
The more an educational system knows about a learner, the more personalised interaction it can provide, which leads to better learning. However, asking a learner directly is potentially disruptive, and often ignored by learners. Especially in the booming realm of MOOC Massive Online Learning platforms, only a very low percentage of users disclose demographic information about themselves. Thus, in this paper, we aim to predict learners’ demographic characteristics, by proposing an approach using linguistically motivated Deep Learning Architectures for Learner Profiling, particularly targeting gender prediction on a FutureLearn MOOC platform. Additionally, we tackle here the difficult problem of predicting the gender of learners based on their comments only – which are often available across MOOCs. The most common current approaches to text classification use the Long Short-Term Memory (LSTM) model, considering sentences as sequences. However, human language also has structures. In this research, rather than considering sentences as plain sequences, we hypothesise that higher semantic - and syntactic level sentence processing based on linguistics will render a richer representation. We thus evaluate, the traditional LSTM versus other bleeding edge models, which take into account syntactic structure, such as tree-structured LSTM, Stack-augmented Parser-Interpreter Neural Network (SPINN) and the Structure-Aware Tag Augmented model (SATA). Additionally, we explore using different word-level encoding functions. We have implemented these methods on Our MOOC dataset, which is the most performant one comparing with a public dataset on sentiment analysis that is further used as a cross-examining for the models' results.
The Layout Analysis of Handwriting Characters and the Fusion of Multi-style Ancient Books’ Background
Ancient books are signiﬁcant culture inheritors and their background textures convey the potential history information. However, multi-style texture recovery of ancient books has received little attention. Restricted by insufﬁcient ancient textures and complex handling process, the generation of ancient textures confronts with new challenges. For instance, training without sufficient data usually brings about overﬁtting or mode collapse, so some of the outputs are prone to be fake. Recently, image generation and style transfer based on deep learning are widely applied in computer vision. Breakthroughs within the ﬁeld make it possible to conduct research upon multi-style texture recovery of ancient books. Under the circumstances, we proposed a network of layout analysis and image fusion system. Firstly, we trained models by using Deep Convolution Generative against Networks (DCGAN) to synthesize multi-style ancient textures; then, we analyzed layouts based on the Position Rearrangement (PR) algorithm that we proposed to adjust the layout structure of foreground content; at last, we realized our goal by fusing rearranged foreground texts and generated background. In experiments, diversiﬁed samples such as ancient Yi, Jurchen, Seal were selected as our training sets. Then, the performances of different ﬁne-turning models were gradually improved by adjusting DCGAN model in parameters as well as structures. In order to evaluate the results scientiﬁcally, cross entropy loss function and Fréchet Inception Distance (FID) are selected to be our assessment criteria. Eventually, we got model M8 with lowest FID score. Compared with DCGAN model proposed by Radford at el., the FID score of M8 improved by 19.26%, enhancing the quality of the synthetic images profoundly.