International Science Index
An Application for Risk of Crime Prediction Using Machine Learning
The increase of the world population, especially in
large urban centers, has resulted in new challenges particularly
with the control and optimization of public safety. Thus, in the
present work, a solution is proposed for the prediction of criminal
occurrences in a city based on historical data of incidents and
demographic information. The entire research and implementation
will be presented start with the data collection from its original
source, the treatment and transformations applied to them, choice and
the evaluation and implementation of the Machine Learning model up
to the application layer. Classification models will be implemented to
predict criminal risk for a given time interval and location. Machine
Learning algorithms such as Random Forest, Neural Networks,
K-Nearest Neighbors and Logistic Regression will be used to predict
occurrences, and their performance will be compared according
to the data processing and transformation used. The results show
that the use of Machine Learning techniques helps to anticipate
criminal occurrences, which contributed to the reinforcement of
public security. Finally, the models were implemented on a platform
that will provide an API to enable other entities to make requests for
predictions in real-time. An application will also be presented where
it is possible to show criminal predictions visually.
Malaria Parasite Detection Using Deep Learning Methods
Malaria is a serious disease which affects hundreds of
millions of people around the world, each year. If not treated in time,
it can be fatal. Despite recent developments in malaria diagnostics,
the microscopy method to detect malaria remains the most common.
Unfortunately, the accuracy of microscopic diagnostics is dependent
on the skill of the microscopist and limits the throughput of malaria
diagnosis. With the development of Artificial Intelligence tools and
Deep Learning techniques in particular, it is possible to lower the cost,
while achieving an overall higher accuracy. In this paper, we present a
VGG-based model and compare it with previously developed models
for identifying infected cells. Our model surpasses most previously
developed models in a range of the accuracy metrics. The model has
an advantage of being constructed from a relatively small number of
layers. This reduces the computer resources and computational time.
Moreover, we test our model on two types of datasets and argue
that the currently developed deep-learning-based methods cannot
efficiently distinguish between infected and contaminated cells. A
more precise study of suspicious regions is required.
Artificial Intelligence-Based Chest X-Ray Test of COVID-19 Patients
The management of COVID-19 patients based on chest imaging is emerging as an essential tool for evaluating the spread of the pandemic which has gripped the global community. It has already been used to monitor the situation of COVID-19 patients who have issues in respiratory status. There has been increase to use chest imaging for medical triage of patients who are showing moderate-severe clinical COVID-19 features, this is due to the fast dispersal of the pandemic to all continents and communities. This article demonstrates the development of machine learning techniques for the test of COVID-19 patients using Chest X-Ray (CXR) images in nearly real-time, to distinguish the COVID-19 infection with a significantly high level of accuracy. The testing performance has covered a combination of different datasets of CXR images of positive COVID-19 patients, patients with viral and bacterial infections, also, people with a clear chest. The proposed AI scheme successfully distinguishes CXR scans of COVID-19 infected patients from CXR scans of viral and bacterial based pneumonia as well as normal cases with an average accuracy of 94.43%, sensitivity 95%, and specificity 93.86%. Predicted decisions would be supported by visual evidence to help clinicians speed up the initial assessment process of new suspected cases, especially in a resource-constrained environment.
Machine Learning Techniques in Bank Credit Analysis
The aim of this paper is to compare and discuss better classifier algorithm options for credit risk assessment by applying different Machine Learning techniques. Using records from a Brazilian financial institution, this study uses a database of 5,432 companies that are clients of the bank, where 2,600 clients are classified as non-defaulters, 1,551 are classified as defaulters and 1,281 are temporarily defaulters, meaning that the clients are overdue on their payments for up 180 days. For each case, a total of 15 attributes was considered for a one-against-all assessment using four different techniques: Artificial Neural Networks Multilayer Perceptron (ANN-MLP), Artificial Neural Networks Radial Basis Functions (ANN-RBF), Logistic Regression (LR) and finally Support Vector Machines (SVM). For each method, different parameters were analyzed in order to obtain different results when the best of each technique was compared. Initially the data were coded in thermometer code (numerical attributes) or dummy coding (for nominal attributes). The methods were then evaluated for each parameter and the best result of each technique was compared in terms of accuracy, false positives, false negatives, true positives and true negatives. This comparison showed that the best method, in terms of accuracy, was ANN-RBF (79.20% for non-defaulter classification, 97.74% for defaulters and 75.37% for the temporarily defaulter classification). However, the best accuracy does not always represent the best technique. For instance, on the classification of temporarily defaulters, this technique, in terms of false positives, was surpassed by SVM, which had the lowest rate (0.07%) of false positive classifications. All these intrinsic details are discussed considering the results found, and an overview of what was presented is shown in the conclusion of this study.
Normal and Peaberry Coffee Beans Classification from Green Coffee Bean Images Using Convolutional Neural Networks and Support Vector Machine
The aim of this study is to develop a system which can identify and sort peaberries automatically at low cost for coffee producers in developing countries. In this paper, the focus is on the classification of peaberries and normal coffee beans using image processing and machine learning techniques. The peaberry is not bad and not a normal bean. The peaberry is born in an only single seed, relatively round seed from a coffee cherry instead of the usual flat-sided pair of beans. It has another value and flavor. To make the taste of the coffee better, it is necessary to separate the peaberry and normal bean before green coffee beans roasting. Otherwise, the taste of total beans will be mixed, and it will be bad. In roaster procedure time, all the beans shape, size, and weight must be unique; otherwise, the larger bean will take more time for roasting inside. The peaberry has a different size and different shape even though they have the same weight as normal beans. The peaberry roasts slower than other normal beans. Therefore, neither technique provides a good option to select the peaberries. Defect beans, e.g., sour, broken, black, and fade bean, are easy to check and pick up manually by hand. On the other hand, the peaberry pick up is very difficult even for trained specialists because the shape and color of the peaberry are similar to normal beans. In this study, we use image processing and machine learning techniques to discriminate the normal and peaberry bean as a part of the sorting system. As the first step, we applied Deep Convolutional Neural Networks (CNN) and Support Vector Machine (SVM) as machine learning techniques to discriminate the peaberry and normal bean. As a result, better performance was obtained with CNN than with SVM for the discrimination of the peaberry. The trained artificial neural network with high performance CPU and GPU in this work will be simply installed into the inexpensive and low in calculation Raspberry Pi system. We assume that this system will be used in under developed countries. The study evaluates and compares the feasibility of the methods in terms of accuracy of classification and processing speed.
A Neuroscience-Based Learning Technique: Framework and Application to STEM
Existing learning techniques such as problem-based learning, project-based learning, or case study learning are learning techniques that focus mainly on technical details, but give no specific guidelines on learner’s experience and emotional learning aspects such as arousal salience and valence, being emotional states important factors affecting engagement and retention. Some approaches involving emotion in educational settings, such as social and emotional learning, lack neuroscientific rigorousness and use of specific neurobiological mechanisms. On the other hand, neurobiology approaches lack educational applicability. And educational approaches mainly focus on cognitive aspects and disregard conditioning learning. First, authors start explaining the reasons why it is hard to learn thoughtfully, then they use the method of neurobiological mapping to track the main limbic system functions, such as the reward circuit, and its relations with perception, memories, motivations, sympathetic and parasympathetic reactions, and sensations, as well as the brain cortex. The authors conclude explaining the major finding: The mechanisms of nonconscious learning and the triggers that guarantee long-term memory potentiation. Afterward, the educational framework for practical application and the instructors’ guidelines are established. An implementation example in engineering education is given, namely, the study of tuned-mass dampers for earthquake oscillations attenuation in skyscrapers. This work represents an original learning technique based on nonconscious learning mechanisms to enhance long-term memories that complement existing cognitive learning methods.
Online Graduate Students’ Perspective on Engagement in Active Learning in the United States
As of 2017, many researchers in educational journals are still wondering if students are effectively and efficiently engaged in active learning in the online learning environment. The goal of this qualitative single case study and narrative research is to explore if students are actively engaged in their online learning. Seven online students in the United States from LinkedIn and residencies were interviewed for this study. Eleven online learning techniques from research were used as a framework. Data collection tools were used for the study that included a digital audiotape, observation sheet, interview protocol, transcription, and NVivo 12 Plus qualitative software. Data analysis process, member checking, and key themes were used to reach saturation. About 85.7% of students preferred individual grading. About 71.4% of students valued professor’s interacting 2-3 times weekly, participating through posts and responses, having good internet access, and using email. Also, about 57.1% said students log in 2-3 times weekly to daily, professor’s social presence helps, regular punctuality in work submission, and prefer assessments style of research, essay, and case study. About 42.9% appreciated syllabus usefulness and professor’s expertise.
An Automated Stock Investment System Using Machine Learning Techniques: An Application in Australia
A key issue in stock investment is how to select representative features for stock selection. The objective of this paper is to firstly determine whether an automated stock investment system, using machine learning techniques, may be used to identify a portfolio of growth stocks that are highly likely to provide returns better than the stock market index. The second objective is to identify the technical features that best characterize whether a stock’s price is likely to go up and to identify the most important factors and their contribution to predicting the likelihood of the stock price going up. Unsupervised machine learning techniques, such as cluster analysis, were applied to the stock data to identify a cluster of stocks that was likely to go up in price – portfolio 1. Next, the principal component analysis technique was used to select stocks that were rated high on component one and component two – portfolio 2. Thirdly, a supervised machine learning technique, the logistic regression method, was used to select stocks with a high probability of their price going up – portfolio 3. The predictive models were validated with metrics such as, sensitivity (recall), specificity and overall accuracy for all models. All accuracy measures were above 70%. All portfolios outperformed the market by more than eight times. The top three stocks were selected for each of the three stock portfolios and traded in the market for one month. After one month the return for each stock portfolio was computed and compared with the stock market index returns. The returns for all three stock portfolios was 23.87% for the principal component analysis stock portfolio, 11.65% for the logistic regression portfolio and 8.88% for the K-means cluster portfolio while the stock market performance was 0.38%. This study confirms that an automated stock investment system using machine learning techniques can identify top performing stock portfolios that outperform the stock market.
Classification Based on Deep Neural Cellular Automata Model
Deep learning structure is a branch of machine learning science and greet achievement in research and applications. Cellular neural networks are regarded as array of nonlinear analog processors called cells connected in a way allowing parallel computations. The paper discusses how to use deep learning structure for representing neural cellular automata model. The proposed learning technique in cellular automata model will be examined from structure of deep learning. A deep automata neural cellular system modifies each neuron based on the behavior of the individual and its decision as a result of multi-level deep structure learning. The paper will present the architecture of the model and the results of simulation of approach are given. Results from the implementation enrich deep neural cellular automata system and shed a light on concept formulation of the model and the learning in it.
Improved Feature Extraction Technique for Handling Occlusion in Automatic Facial Expression Recognition
The field of automatic facial expression analysis has been an active research area in the last two decades. Its vast applicability in various domains has drawn so much attention into developing techniques and dataset that mirror real life scenarios. Many techniques such as Local Binary Patterns and its variants (CLBP, LBP-TOP) and lately, deep learning techniques, have been used for facial expression recognition. However, the problem of occlusion has not been sufficiently handled, making their results not applicable in real life situations. This paper develops a simple, yet highly efficient method tagged Local Binary Pattern-Histogram of Gradient (LBP-HOG) with occlusion detection in face image, using a multi-class SVM for Action Unit and in turn expression recognition. Our method was evaluated on three publicly available datasets which are JAFFE, CK, SFEW. Experimental results showed that our approach performed considerably well when compared with state-of-the-art algorithms and gave insight to occlusion detection as a key step to handling expression in wild.
Machine Learning Methods for Network Intrusion Detection
Network security engineers work to keep services available all the time by handling intruder attacks. Intrusion Detection System (IDS) is one of the obtainable mechanisms that is used to sense and classify any abnormal actions. Therefore, the IDS must be always up to date with the latest intruder attacks signatures to preserve confidentiality, integrity, and availability of the services. The speed of the IDS is a very important issue as well learning the new attacks. This research work illustrates how the Knowledge Discovery and Data Mining (or Knowledge Discovery in Databases) KDD dataset is very handy for testing and evaluating different Machine Learning Techniques. It mainly focuses on the KDD preprocess part in order to prepare a decent and fair experimental data set. The J48, MLP, and Bayes Network classifiers have been chosen for this study. It has been proven that the J48 classifier has achieved the highest accuracy rate for detecting and classifying all KDD dataset attacks, which are of type DOS, R2L, U2R, and PROBE.
Off-Policy Q-learning Technique for Intrusion Response in Network Security
With the increasing dependency on our computer
devices, we face the necessity of adequate, efficient and effective
mechanisms, for protecting our network. There are two main
problems that Intrusion Detection Systems (IDS) attempt to solve.
1) To detect the attack, by analyzing the incoming traffic and inspect
the network (intrusion detection). 2) To produce a prompt response
when the attack occurs (intrusion prevention). It is critical creating an
Intrusion detection model that will detect a breach in the system on
time and also challenging making it provide an automatic and with
an acceptable delay response at every single stage of the monitoring
process. We cannot afford to adopt security measures with a high
exploiting computational power, and we are not able to accept a
mechanism that will react with a delay. In this paper, we will
propose an intrusion response mechanism that is based on artificial
intelligence, and more precisely, reinforcement learning techniques
(RLT). The RLT will help us to create a decision agent, who will
control the process of interacting with the undetermined environment.
The goal is to find an optimal policy, which will represent the
intrusion response, therefore, to solve the Reinforcement learning
problem, using a Q-learning approach. Our agent will produce an
optimal immediate response, in the process of evaluating the network
traffic.This Q-learning approach will establish the balance between
exploration and exploitation and provide a unique, self-learning and
strategic artificial intelligence response mechanism for IDS.
Evaluating Machine Learning Techniques for Activity Classification in Smart Home Environments
With the widespread adoption of the Internet-connected
devices, and with the prevalence of the Internet of Things (IoT)
applications, there is an increased interest in machine learning
techniques that can provide useful and interesting services in the
smart home domain. The areas that machine learning techniques
can help advance are varied and ever-evolving. Classifying smart
home inhabitants’ Activities of Daily Living (ADLs), is one
prominent example. The ability of machine learning technique to find
meaningful spatio-temporal relations of high-dimensional data is an
important requirement as well. This paper presents a comparative
evaluation of state-of-the-art machine learning techniques to classify
ADLs in the smart home domain. Forty-two synthetic datasets and
two real-world datasets with multiple inhabitants are used to evaluate
and compare the performance of the identified machine learning
techniques. Our results show significant performance differences
between the evaluated techniques. Such as AdaBoost, Cortical
Learning Algorithm (CLA), Decision Trees, Hidden Markov Model
(HMM), Multi-layer Perceptron (MLP), Structured Perceptron and
Support Vector Machines (SVM). Overall, neural network based
techniques have shown superiority over the other tested techniques.
An Intelligent Baby Care System Based on IoT and Deep Learning Techniques
Due to the heavy burden and pressure of caring for infants, an integrated automatic baby watching system based on IoT smart sensing and deep learning machine vision techniques is proposed in this paper. By monitoring infant body conditions such as heartbeat, breathing, body temperature, sleeping posture, as well as the surrounding conditions such as dangerous/sharp objects, light, noise, humidity and temperature, the proposed system can analyze and predict the obvious/potential dangerous conditions according to observed data and then adopt suitable actions in real time to protect the infant from harm. Thus, reducing the burden of the caregiver and improving safety efficiency of the caring work. The experimental results show that the proposed system works successfully for the infant care work and thus can be implemented in various life fields practically.
Feature Selection and Predictive Modeling of Housing Data Using Random Forest
Predictive data analysis and modeling involving machine learning techniques become challenging in presence of too many explanatory variables or features. Presence of too many features in machine learning is known to not only cause algorithms to slow down, but they can also lead to decrease in model prediction accuracy. This study involves housing dataset with 79 quantitative and qualitative features that describe various aspects people consider while buying a new house. Boruta algorithm that supports feature selection using a wrapper approach build around random forest is used in this study. This feature selection process leads to 49 confirmed features which are then used for developing predictive random forest models. The study also explores five different data partitioning ratios and their impact on model accuracy are captured using coefficient of determination (r-square) and root mean square error (rsme).
The Effect of Cooperative Learning on Academic Achievement of Grade Nine Students in Mathematics: The Case of Mettu Secondary and Preparatory School
The aim of this study was to examine the effect of
cooperative learning method on student’s academic achievement and
on the achievement level over a usual method in teaching different
topics of mathematics. The study also examines the perceptions of
students towards cooperative learning. Cooperative learning is the
instructional strategy in which pairs or small groups of students with
different levels of ability work together to accomplish a shared goal.
The aim of this cooperation is for students to maximize their own
and each other learning, with members striving for joint benefit.
The teacher’s role changes from wise on the wise to guide on
the side. Cooperative learning due to its influential aspects is the
most prevalent teaching-learning technique in the modern world.
Therefore the study was conducted in order to examine the effect
of cooperative learning on the academic achievement of grade 9
students in Mathematics in case of Mettu secondary school. Two
sample sections are randomly selected by which one section served
randomly as an experimental and the other as a comparison group.
Data gathering instruments are achievement tests and questionnaires.
A treatment of STAD method of cooperative learning was provided
to the experimental group while the usual method is used in the
comparison group. The experiment lasted for one semester. To
determine the effect of cooperative learning on the student’s academic
achievement, the significance of difference between the scores of
groups at 0.05 levels was tested by applying t test. The effect size
was calculated to see the strength of the treatment. The student’s
perceptions about the method were tested by percentiles of the
questionnaires. During data analysis, each group was divided into
high and low achievers on basis of their previous Mathematics result.
Data analysis revealed that both the experimental and comparison
groups were almost equal in Mathematics at the beginning of the
experiment. The experimental group out scored significantly than
comparison group on posttest. Additionally, the comparison of mean
posttest scores of high achievers indicates significant difference
between the two groups. The same is true for low achiever students
of both groups on posttest. Hence, the result of the study indicates
the effectiveness of the method for Mathematics topics as compared
to usual method of teaching.
Prediction-Based Midterm Operation Planning for Energy Management of Exhibition Hall
Large exhibition halls require a lot of energy to maintain comfortable atmosphere for the visitors viewing inside. One way of reducing the energy cost is to have thermal energy storage systems installed so that the thermal energy can be stored in the middle of night when the energy price is low and then used later when the price is high. To minimize the overall energy cost, however, we should be able to decide how much energy to save during which time period exactly. If we can foresee future energy load and the corresponding cost, we will be able to make such decisions reasonably. In this paper, we use machine learning technique to obtain models for predicting weather conditions and the number of visitors on hourly basis for the next day. Based on the energy load thus predicted, we build a cost-optimal daily operation plan for the thermal energy storage systems and cooling and heating facilities through simulation-based optimization.
Hand Gesture Interpretation Using Sensing Glove Integrated with Machine Learning Algorithms
In this paper, we present a low cost design for a smart glove that can perform sign language recognition to assist the speech impaired people. Specifically, we have designed and developed an Assistive Hand Gesture Interpreter that recognizes hand movements relevant to the American Sign Language (ASL) and translates them into text for display on a Thin-Film-Transistor Liquid Crystal Display (TFT LCD) screen as well as synthetic speech. Linear Bayes Classifiers and Multilayer Neural Networks have been used to classify 11 feature vectors obtained from the sensors on the glove into one of the 27 ASL alphabets and a predefined gesture for space. Three types of features are used; bending using six bend sensors, orientation in three dimensions using accelerometers and contacts at vital points using contact sensors. To gauge the performance of the presented design, the training database was prepared using five volunteers. The accuracy of the current version on the prepared dataset was found to be up to 99.3% for target user. The solution combines electronics, e-textile technology, sensor technology, embedded system and machine learning techniques to build a low cost wearable glove that is scrupulous, elegant and portable.
Context Detection in Spreadsheets Based on Automatically Inferred Table Schema
Programming requires years of training. With natural language and end user development methods, programming could become available to everyone. It enables end users to program their own devices and extend the functionality of the existing system without any knowledge of programming languages. In this paper, we describe an Interactive Spreadsheet Processing Module (ISPM), a natural language interface to spreadsheets that allows users to address ranges within the spreadsheet based on inferred table schema. Using the ISPM, end users are able to search for values in the schema of the table and to address the data in spreadsheets implicitly. Furthermore, it enables them to select and sort the spreadsheet data by using natural language. ISPM uses a machine learning technique to automatically infer areas within a spreadsheet, including different kinds of headers and data ranges. Since ranges can be identified from natural language queries, the end users can query the data using natural language. During the evaluation 12 undergraduate students were asked to perform operations (sum, sort, group and select) using the system and also Excel without ISPM interface, and the time taken for task completion was compared across the two systems. Only for the selection task did users take less time in Excel (since they directly selected the cells using the mouse) than in ISPM, by using natural language for end user software engineering, to overcome the present bottleneck of professional developers.
Mining User-Generated Contents to Detect Service Failures with Topic Model
Online user-generated contents (UGC) significantly change the way customers behave (e.g., shop, travel), and a pressing need to handle the overwhelmingly plethora amount of various UGC is one of the paramount issues for management. However, a current approach (e.g., sentiment analysis) is often ineffective for leveraging textual information to detect the problems or issues that a certain management suffers from. In this paper, we employ text mining of Latent Dirichlet Allocation (LDA) on a popular online review site dedicated to complaint from users. We find that the employed LDA efficiently detects customer complaints, and a further inspection with the visualization technique is effective to categorize the problems or issues. As such, management can identify the issues at stake and prioritize them accordingly in a timely manner given the limited amount of resources. The findings provide managerial insights into how analytics on social media can help maintain and improve their reputation management. Our interdisciplinary approach also highlights several insights by applying machine learning techniques in marketing research domain. On a broader technical note, this paper illustrates the details of how to implement LDA in R program from a beginning (data collection in R) to an end (LDA analysis in R) since the instruction is still largely undocumented. In this regard, it will help lower the boundary for interdisciplinary researcher to conduct related research.
Case-Based Reasoning: A Hybrid Classification Model Improved with an Expert's Knowledge for High-Dimensional Problems
Data mining and classification of objects is the process of data analysis, using various machine learning techniques, which is used today in various fields of research. This paper presents a concept of hybrid classification model improved with the expert knowledge. The hybrid model in its algorithm has integrated several machine learning techniques (Information Gain, K-means, and Case-Based Reasoning) and the expert’s knowledge into one. The knowledge of experts is used to determine the importance of features. The paper presents the model algorithm and the results of the case study in which the emphasis was put on achieving the maximum classification accuracy without reducing the number of features.
An Empirical Evaluation of Performance of Machine Learning Techniques on Imbalanced Software Quality Data
The development of change prediction models can help the software practitioners in planning testing and inspection resources at early phases of software development. However, a major challenge faced during the training process of any classification model is the imbalanced nature of the software quality data. A data with very few minority outcome categories leads to inefficient learning process and a classification model developed from the imbalanced data generally does not predict these minority categories correctly. Thus, for a given dataset, a minority of classes may be change prone whereas a majority of classes may be non-change prone. This study explores various alternatives for adeptly handling the imbalanced software quality data using different sampling methods and effective MetaCost learners. The study also analyzes and justifies the use of different performance metrics while dealing with the imbalanced data. In order to empirically validate different alternatives, the study uses change data from three application packages of open-source Android data set and evaluates the performance of six different machine learning techniques. The results of the study indicate extensive improvement in the performance of the classification models when using resampling method and robust performance measures.
Anomaly Detection with ANN and SVM for Telemedicine Networks
In recent years, a wide variety of applications are developed with Support Vector Machines -SVM- methods and Artificial Neural Networks -ANN-. In general, these methods depend on intrusion knowledge databases such as KDD99, ISCX, and CAIDA among others. New classes of detectors are generated by machine learning techniques, trained and tested over network databases. Thereafter, detectors are employed to detect anomalies in network communication scenarios according to user’s connections behavior. The first detector based on training dataset is deployed in different real-world networks with mobile and non-mobile devices to analyze the performance and accuracy over static detection. The vulnerabilities are based on previous work in telemedicine apps that were developed on the research group. This paper presents the differences on detections results between some network scenarios by applying traditional detectors deployed with artificial neural networks and support vector machines.
A Supervised Learning Data Mining Approach for Object Recognition and Classification in High Resolution Satellite Data
Advances in spatial and spectral resolution of satellite
images have led to tremendous growth in large image databases. The
data we acquire through satellites, radars, and sensors consists of
important geographical information that can be used for remote
sensing applications such as region planning, disaster management.
Spatial data classification and object recognition are important tasks
for many applications. However, classifying objects and identifying
them manually from images is a difficult task. Object recognition is
often considered as a classification problem, this task can be
performed using machine-learning techniques. Despite of many
machine-learning algorithms, the classification is done using
supervised classifiers such as Support Vector Machines (SVM) as the
area of interest is known. We proposed a classification method,
which considers neighboring pixels in a region for feature extraction
and it evaluates classifications precisely according to neighboring
classes for semantic interpretation of region of interest (ROI). A
dataset has been created for training and testing purpose; we
generated the attributes by considering pixel intensity values and
mean values of reflectance. We demonstrated the benefits of using
knowledge discovery and data-mining techniques, which can be on
image data for accurate information extraction and classification from
high spatial resolution remote sensing imagery.
Mobile Collaboration Learning Technique on Students in Developing Nations
New and more powerful communications technologies
continue to emerge at a rapid pace and their uses in education are
widespread and the impact remarkable in the developing societies.
This study investigates Mobile Collaboration Learning Technique
(MCLT) on learners’ outcome among students in tertiary institutions
of developing nations (a case of Nigeria students). It examines the
significance of retention achievement scores of students taught using
mobile collaboration and conventional method. The sample consisted
of 120 students using Stratified random sampling method. Five
research questions and hypotheses were formulated, and tested at
0.05 level of significance. A student achievement test (SAT) was
made of 40 items of multiple-choice objective type, developed and
validated for data collection by professionals. The SAT was
administered to students as pre-test and post-test. The data were
analyzed using t-test statistic to test the hypotheses. The result
indicated that students taught using MCLT performed significantly
better than their counterparts using the conventional method of
instruction. Also, there was no significant difference in the post-test
performance scores of male and female students taught using MCLT.
Based on the findings, the following submissions was made that:
Mobile collaboration system be encouraged in the institutions to
boost knowledge sharing among learners, workshop and training
should be organized to train teachers on the use of this technique,
schools and government should consistently align curriculum
standard to trends of technological dictates and formulate policies
and procedures towards responsible use of MCLT.
A Comparative Study of Malware Detection Techniques Using Machine Learning Methods
In the past few years, the amount of malicious software
increased exponentially and, therefore, machine learning algorithms
became instrumental in identifying clean and malware files through
(semi)-automated classification. When working with very large
datasets, the major challenge is to reach both a very high malware
detection rate and a very low false positive rate. Another challenge
is to minimize the time needed for the machine learning algorithm to
do so. This paper presents a comparative study between different
machine learning techniques such as linear classifiers, ensembles,
decision trees or various hybrids thereof. The training dataset consists
of approximately 2 million clean files and 200.000 infected files,
which is a realistic quantitative mixture. The paper investigates the
above mentioned methods with respect to both their performance
(detection rate and false positive rate) and their practicability.
Feature-Based Summarizing and Ranking from Customer Reviews
Due to the rapid increase of Internet, web opinion
sources dynamically emerge which is useful for both potential
customers and product manufacturers for prediction and decision
purposes. These are the user generated contents written in natural
languages and are unstructured-free-texts scheme. Therefore, opinion
mining techniques become popular to automatically process customer
reviews for extracting product features and user opinions expressed
over them. Since customer reviews may contain both opinionated and
factual sentences, a supervised machine learning technique applies
for subjectivity classification to improve the mining performance. In
this paper, we dedicate our work is the task of opinion
summarization. Therefore, product feature and opinion extraction is
critical to opinion summarization, because its effectiveness
significantly affects the identification of semantic relationships. The
polarity and numeric score of all the features are determined by
Senti-WordNet Lexicon. The problem of opinion summarization
refers how to relate the opinion words with respect to a certain
feature. Probabilistic based model of supervised learning will
improve the result that is more flexible and effective.
Cross Project Software Fault Prediction at Design Phase
Software fault prediction models are created by using
the source code, processed metrics from the same or previous version
of code and related fault data. Some company do not store and keep
track of all artifacts which are required for software fault prediction.
To construct fault prediction model for such company, the training
data from the other projects can be one potential solution. Earlier we
predicted the fault the less cost it requires to correct. The training
data consists of metrics data and related fault data at function/module
level. This paper investigates fault predictions at early stage using the
cross-project data focusing on the design metrics. In this study,
empirical analysis is carried out to validate design metrics for cross
project fault prediction. The machine learning techniques used for
evaluation is Naïve Bayes. The design phase metrics of other projects
can be used as initial guideline for the projects where no previous
fault data is available. We analyze seven datasets from NASA
Metrics Data Program which offer design as well as code metrics.
Overall, the results of cross project is comparable to the within
company data learning.
What the Future Holds for Social Media Data Analysis
The dramatic rise in the use of Social Media (SM)
platforms such as Facebook and Twitter provide access to an
unprecedented amount of user data. Users may post reviews on
products and services they bought, write about their interests, share
ideas or give their opinions and views on political issues. There is a
growing interest in the analysis of SM data from organisations for
detecting new trends, obtaining user opinions on their products and
services or finding out about their online reputations. A recent
research trend in SM analysis is making predictions based on
sentiment analysis of SM. Often indicators of historic SM data are
represented as time series and correlated with a variety of real world
phenomena like the outcome of elections, the development of
financial indicators, box office revenue and disease outbreaks. This
paper examines the current state of research in the area of SM mining
and predictive analysis and gives an overview of the analysis
methods using opinion mining and machine learning techniques.
Imputation Technique for Feature Selection in Microarray Data Set
Analyzing DNA microarray data sets is a great
challenge, which faces the bioinformaticians due to the complication
of using statistical and machine learning techniques. The challenge
will be doubled if the microarray data sets contain missing data,
which happens regularly because these techniques cannot deal with
missing data. One of the most important data analysis process on
the microarray data set is feature selection. This process finds the
most important genes that affect certain disease. In this paper, we
introduce a technique for imputing the missing data in microarray
data sets while performing feature selection.