International Science Index
A Neuroscience-Based Learning Technique: Framework and Application to STEM
Existing learning techniques such as problem-based learning, project-based learning, or case study learning are learning techniques that focus mainly on technical details, but give no specific guidelines on learner’s experience and emotional learning aspects such as arousal salience and valence, being emotional states important factors affecting engagement and retention. Some approaches involving emotion in educational settings, such as social and emotional learning, lack neuroscientific rigorousness and use of specific neurobiological mechanisms. On the other hand, neurobiology approaches lack educational applicability. And educational approaches mainly focus on cognitive aspects and disregard conditioning learning. First, authors start explaining the reasons why it is hard to learn thoughtfully, then they use the method of neurobiological mapping to track the main limbic system functions, such as the reward circuit, and its relations with perception, memories, motivations, sympathetic and parasympathetic reactions, and sensations, as well as the brain cortex. The authors conclude explaining the major finding: The mechanisms of nonconscious learning and the triggers that guarantee long-term memory potentiation. Afterward, the educational framework for practical application and the instructors’ guidelines are established. An implementation example in engineering education is given, namely, the study of tuned-mass dampers for earthquake oscillations attenuation in skyscrapers. This work represents an original learning technique based on nonconscious learning mechanisms to enhance long-term memories that complement existing cognitive learning methods.
Online Graduate Students’ Perspective on Engagement in Active Learning in the United States
As of 2017, many researchers in educational journals are still wondering if students are effectively and efficiently engaged in active learning in the online learning environment. The goal of this qualitative single case study and narrative research is to explore if students are actively engaged in their online learning. Seven online students in the United States from LinkedIn and residencies were interviewed for this study. Eleven online learning techniques from research were used as a framework. Data collection tools were used for the study that included a digital audiotape, observation sheet, interview protocol, transcription, and NVivo 12 Plus qualitative software. Data analysis process, member checking, and key themes were used to reach saturation. About 85.7% of students preferred individual grading. About 71.4% of students valued professor’s interacting 2-3 times weekly, participating through posts and responses, having good internet access, and using email. Also, about 57.1% said students log in 2-3 times weekly to daily, professor’s social presence helps, regular punctuality in work submission, and prefer assessments style of research, essay, and case study. About 42.9% appreciated syllabus usefulness and professor’s expertise.
An Automated Stock Investment System Using Machine Learning Techniques: An Application in Australia
A key issue in stock investment is how to select representative features for stock selection. The objective of this paper is to firstly determine whether an automated stock investment system, using machine learning techniques, may be used to identify a portfolio of growth stocks that are highly likely to provide returns better than the stock market index. The second objective is to identify the technical features that best characterize whether a stock’s price is likely to go up and to identify the most important factors and their contribution to predicting the likelihood of the stock price going up. Unsupervised machine learning techniques, such as cluster analysis, were applied to the stock data to identify a cluster of stocks that was likely to go up in price – portfolio 1. Next, the principal component analysis technique was used to select stocks that were rated high on component one and component two – portfolio 2. Thirdly, a supervised machine learning technique, the logistic regression method, was used to select stocks with a high probability of their price going up – portfolio 3. The predictive models were validated with metrics such as, sensitivity (recall), specificity and overall accuracy for all models. All accuracy measures were above 70%. All portfolios outperformed the market by more than eight times. The top three stocks were selected for each of the three stock portfolios and traded in the market for one month. After one month the return for each stock portfolio was computed and compared with the stock market index returns. The returns for all three stock portfolios was 23.87% for the principal component analysis stock portfolio, 11.65% for the logistic regression portfolio and 8.88% for the K-means cluster portfolio while the stock market performance was 0.38%. This study confirms that an automated stock investment system using machine learning techniques can identify top performing stock portfolios that outperform the stock market.
Classification Based on Deep Neural Cellular Automata Model
Deep learning structure is a branch of machine learning science and greet achievement in research and applications. Cellular neural networks are regarded as array of nonlinear analog processors called cells connected in a way allowing parallel computations. The paper discusses how to use deep learning structure for representing neural cellular automata model. The proposed learning technique in cellular automata model will be examined from structure of deep learning. A deep automata neural cellular system modifies each neuron based on the behavior of the individual and its decision as a result of multi-level deep structure learning. The paper will present the architecture of the model and the results of simulation of approach are given. Results from the implementation enrich deep neural cellular automata system and shed a light on concept formulation of the model and the learning in it.
Improved Feature Extraction Technique for Handling Occlusion in Automatic Facial Expression Recognition
The field of automatic facial expression analysis has been an active research area in the last two decades. Its vast applicability in various domains has drawn so much attention into developing techniques and dataset that mirror real life scenarios. Many techniques such as Local Binary Patterns and its variants (CLBP, LBP-TOP) and lately, deep learning techniques, have been used for facial expression recognition. However, the problem of occlusion has not been sufficiently handled, making their results not applicable in real life situations. This paper develops a simple, yet highly efficient method tagged Local Binary Pattern-Histogram of Gradient (LBP-HOG) with occlusion detection in face image, using a multi-class SVM for Action Unit and in turn expression recognition. Our method was evaluated on three publicly available datasets which are JAFFE, CK, SFEW. Experimental results showed that our approach performed considerably well when compared with state-of-the-art algorithms and gave insight to occlusion detection as a key step to handling expression in wild.
Machine Learning Methods for Network Intrusion Detection
Network security engineers work to keep services available all the time by handling intruder attacks. Intrusion Detection System (IDS) is one of the obtainable mechanisms that is used to sense and classify any abnormal actions. Therefore, the IDS must be always up to date with the latest intruder attacks signatures to preserve confidentiality, integrity, and availability of the services. The speed of the IDS is a very important issue as well learning the new attacks. This research work illustrates how the Knowledge Discovery and Data Mining (or Knowledge Discovery in Databases) KDD dataset is very handy for testing and evaluating different Machine Learning Techniques. It mainly focuses on the KDD preprocess part in order to prepare a decent and fair experimental data set. The J48, MLP, and Bayes Network classifiers have been chosen for this study. It has been proven that the J48 classifier has achieved the highest accuracy rate for detecting and classifying all KDD dataset attacks, which are of type DOS, R2L, U2R, and PROBE.
Off-Policy Q-learning Technique for Intrusion Response in Network Security
With the increasing dependency on our computer
devices, we face the necessity of adequate, efficient and effective
mechanisms, for protecting our network. There are two main
problems that Intrusion Detection Systems (IDS) attempt to solve.
1) To detect the attack, by analyzing the incoming traffic and inspect
the network (intrusion detection). 2) To produce a prompt response
when the attack occurs (intrusion prevention). It is critical creating an
Intrusion detection model that will detect a breach in the system on
time and also challenging making it provide an automatic and with
an acceptable delay response at every single stage of the monitoring
process. We cannot afford to adopt security measures with a high
exploiting computational power, and we are not able to accept a
mechanism that will react with a delay. In this paper, we will
propose an intrusion response mechanism that is based on artificial
intelligence, and more precisely, reinforcement learning techniques
(RLT). The RLT will help us to create a decision agent, who will
control the process of interacting with the undetermined environment.
The goal is to find an optimal policy, which will represent the
intrusion response, therefore, to solve the Reinforcement learning
problem, using a Q-learning approach. Our agent will produce an
optimal immediate response, in the process of evaluating the network
traffic.This Q-learning approach will establish the balance between
exploration and exploitation and provide a unique, self-learning and
strategic artificial intelligence response mechanism for IDS.
Evaluating Machine Learning Techniques for Activity Classification in Smart Home Environments
With the widespread adoption of the Internet-connected
devices, and with the prevalence of the Internet of Things (IoT)
applications, there is an increased interest in machine learning
techniques that can provide useful and interesting services in the
smart home domain. The areas that machine learning techniques
can help advance are varied and ever-evolving. Classifying smart
home inhabitants’ Activities of Daily Living (ADLs), is one
prominent example. The ability of machine learning technique to find
meaningful spatio-temporal relations of high-dimensional data is an
important requirement as well. This paper presents a comparative
evaluation of state-of-the-art machine learning techniques to classify
ADLs in the smart home domain. Forty-two synthetic datasets and
two real-world datasets with multiple inhabitants are used to evaluate
and compare the performance of the identified machine learning
techniques. Our results show significant performance differences
between the evaluated techniques. Such as AdaBoost, Cortical
Learning Algorithm (CLA), Decision Trees, Hidden Markov Model
(HMM), Multi-layer Perceptron (MLP), Structured Perceptron and
Support Vector Machines (SVM). Overall, neural network based
techniques have shown superiority over the other tested techniques.
An Intelligent Baby Care System Based on IoT and Deep Learning Techniques
Due to the heavy burden and pressure of caring for infants, an integrated automatic baby watching system based on IoT smart sensing and deep learning machine vision techniques is proposed in this paper. By monitoring infant body conditions such as heartbeat, breathing, body temperature, sleeping posture, as well as the surrounding conditions such as dangerous/sharp objects, light, noise, humidity and temperature, the proposed system can analyze and predict the obvious/potential dangerous conditions according to observed data and then adopt suitable actions in real time to protect the infant from harm. Thus, reducing the burden of the caregiver and improving safety efficiency of the caring work. The experimental results show that the proposed system works successfully for the infant care work and thus can be implemented in various life fields practically.
Feature Selection and Predictive Modeling of Housing Data Using Random Forest
Predictive data analysis and modeling involving machine learning techniques become challenging in presence of too many explanatory variables or features. Presence of too many features in machine learning is known to not only cause algorithms to slow down, but they can also lead to decrease in model prediction accuracy. This study involves housing dataset with 79 quantitative and qualitative features that describe various aspects people consider while buying a new house. Boruta algorithm that supports feature selection using a wrapper approach build around random forest is used in this study. This feature selection process leads to 49 confirmed features which are then used for developing predictive random forest models. The study also explores five different data partitioning ratios and their impact on model accuracy are captured using coefficient of determination (r-square) and root mean square error (rsme).
The Effect of Cooperative Learning on Academic Achievement of Grade Nine Students in Mathematics: The Case of Mettu Secondary and Preparatory School
The aim of this study was to examine the effect of
cooperative learning method on student’s academic achievement and
on the achievement level over a usual method in teaching different
topics of mathematics. The study also examines the perceptions of
students towards cooperative learning. Cooperative learning is the
instructional strategy in which pairs or small groups of students with
different levels of ability work together to accomplish a shared goal.
The aim of this cooperation is for students to maximize their own
and each other learning, with members striving for joint benefit.
The teacher’s role changes from wise on the wise to guide on
the side. Cooperative learning due to its influential aspects is the
most prevalent teaching-learning technique in the modern world.
Therefore the study was conducted in order to examine the effect
of cooperative learning on the academic achievement of grade 9
students in Mathematics in case of Mettu secondary school. Two
sample sections are randomly selected by which one section served
randomly as an experimental and the other as a comparison group.
Data gathering instruments are achievement tests and questionnaires.
A treatment of STAD method of cooperative learning was provided
to the experimental group while the usual method is used in the
comparison group. The experiment lasted for one semester. To
determine the effect of cooperative learning on the student’s academic
achievement, the significance of difference between the scores of
groups at 0.05 levels was tested by applying t test. The effect size
was calculated to see the strength of the treatment. The student’s
perceptions about the method were tested by percentiles of the
questionnaires. During data analysis, each group was divided into
high and low achievers on basis of their previous Mathematics result.
Data analysis revealed that both the experimental and comparison
groups were almost equal in Mathematics at the beginning of the
experiment. The experimental group out scored significantly than
comparison group on posttest. Additionally, the comparison of mean
posttest scores of high achievers indicates significant difference
between the two groups. The same is true for low achiever students
of both groups on posttest. Hence, the result of the study indicates
the effectiveness of the method for Mathematics topics as compared
to usual method of teaching.
Prediction-Based Midterm Operation Planning for Energy Management of Exhibition Hall
Large exhibition halls require a lot of energy to maintain comfortable atmosphere for the visitors viewing inside. One way of reducing the energy cost is to have thermal energy storage systems installed so that the thermal energy can be stored in the middle of night when the energy price is low and then used later when the price is high. To minimize the overall energy cost, however, we should be able to decide how much energy to save during which time period exactly. If we can foresee future energy load and the corresponding cost, we will be able to make such decisions reasonably. In this paper, we use machine learning technique to obtain models for predicting weather conditions and the number of visitors on hourly basis for the next day. Based on the energy load thus predicted, we build a cost-optimal daily operation plan for the thermal energy storage systems and cooling and heating facilities through simulation-based optimization.
Hand Gesture Interpretation Using Sensing Glove Integrated with Machine Learning Algorithms
In this paper, we present a low cost design for a smart glove that can perform sign language recognition to assist the speech impaired people. Specifically, we have designed and developed an Assistive Hand Gesture Interpreter that recognizes hand movements relevant to the American Sign Language (ASL) and translates them into text for display on a Thin-Film-Transistor Liquid Crystal Display (TFT LCD) screen as well as synthetic speech. Linear Bayes Classifiers and Multilayer Neural Networks have been used to classify 11 feature vectors obtained from the sensors on the glove into one of the 27 ASL alphabets and a predefined gesture for space. Three types of features are used; bending using six bend sensors, orientation in three dimensions using accelerometers and contacts at vital points using contact sensors. To gauge the performance of the presented design, the training database was prepared using five volunteers. The accuracy of the current version on the prepared dataset was found to be up to 99.3% for target user. The solution combines electronics, e-textile technology, sensor technology, embedded system and machine learning techniques to build a low cost wearable glove that is scrupulous, elegant and portable.
Context Detection in Spreadsheets Based on Automatically Inferred Table Schema
Programming requires years of training. With natural language and end user development methods, programming could become available to everyone. It enables end users to program their own devices and extend the functionality of the existing system without any knowledge of programming languages. In this paper, we describe an Interactive Spreadsheet Processing Module (ISPM), a natural language interface to spreadsheets that allows users to address ranges within the spreadsheet based on inferred table schema. Using the ISPM, end users are able to search for values in the schema of the table and to address the data in spreadsheets implicitly. Furthermore, it enables them to select and sort the spreadsheet data by using natural language. ISPM uses a machine learning technique to automatically infer areas within a spreadsheet, including different kinds of headers and data ranges. Since ranges can be identified from natural language queries, the end users can query the data using natural language. During the evaluation 12 undergraduate students were asked to perform operations (sum, sort, group and select) using the system and also Excel without ISPM interface, and the time taken for task completion was compared across the two systems. Only for the selection task did users take less time in Excel (since they directly selected the cells using the mouse) than in ISPM, by using natural language for end user software engineering, to overcome the present bottleneck of professional developers.
Mining User-Generated Contents to Detect Service Failures with Topic Model
Online user-generated contents (UGC) significantly change the way customers behave (e.g., shop, travel), and a pressing need to handle the overwhelmingly plethora amount of various UGC is one of the paramount issues for management. However, a current approach (e.g., sentiment analysis) is often ineffective for leveraging textual information to detect the problems or issues that a certain management suffers from. In this paper, we employ text mining of Latent Dirichlet Allocation (LDA) on a popular online review site dedicated to complaint from users. We find that the employed LDA efficiently detects customer complaints, and a further inspection with the visualization technique is effective to categorize the problems or issues. As such, management can identify the issues at stake and prioritize them accordingly in a timely manner given the limited amount of resources. The findings provide managerial insights into how analytics on social media can help maintain and improve their reputation management. Our interdisciplinary approach also highlights several insights by applying machine learning techniques in marketing research domain. On a broader technical note, this paper illustrates the details of how to implement LDA in R program from a beginning (data collection in R) to an end (LDA analysis in R) since the instruction is still largely undocumented. In this regard, it will help lower the boundary for interdisciplinary researcher to conduct related research.
Case-Based Reasoning: A Hybrid Classification Model Improved with an Expert's Knowledge for High-Dimensional Problems
Data mining and classification of objects is the process of data analysis, using various machine learning techniques, which is used today in various fields of research. This paper presents a concept of hybrid classification model improved with the expert knowledge. The hybrid model in its algorithm has integrated several machine learning techniques (Information Gain, K-means, and Case-Based Reasoning) and the expert’s knowledge into one. The knowledge of experts is used to determine the importance of features. The paper presents the model algorithm and the results of the case study in which the emphasis was put on achieving the maximum classification accuracy without reducing the number of features.
An Empirical Evaluation of Performance of Machine Learning Techniques on Imbalanced Software Quality Data
The development of change prediction models can help the software practitioners in planning testing and inspection resources at early phases of software development. However, a major challenge faced during the training process of any classification model is the imbalanced nature of the software quality data. A data with very few minority outcome categories leads to inefficient learning process and a classification model developed from the imbalanced data generally does not predict these minority categories correctly. Thus, for a given dataset, a minority of classes may be change prone whereas a majority of classes may be non-change prone. This study explores various alternatives for adeptly handling the imbalanced software quality data using different sampling methods and effective MetaCost learners. The study also analyzes and justifies the use of different performance metrics while dealing with the imbalanced data. In order to empirically validate different alternatives, the study uses change data from three application packages of open-source Android data set and evaluates the performance of six different machine learning techniques. The results of the study indicate extensive improvement in the performance of the classification models when using resampling method and robust performance measures.
Anomaly Detection with ANN and SVM for Telemedicine Networks
In recent years, a wide variety of applications are developed with Support Vector Machines -SVM- methods and Artificial Neural Networks -ANN-. In general, these methods depend on intrusion knowledge databases such as KDD99, ISCX, and CAIDA among others. New classes of detectors are generated by machine learning techniques, trained and tested over network databases. Thereafter, detectors are employed to detect anomalies in network communication scenarios according to user’s connections behavior. The first detector based on training dataset is deployed in different real-world networks with mobile and non-mobile devices to analyze the performance and accuracy over static detection. The vulnerabilities are based on previous work in telemedicine apps that were developed on the research group. This paper presents the differences on detections results between some network scenarios by applying traditional detectors deployed with artificial neural networks and support vector machines.
A Supervised Learning Data Mining Approach for Object Recognition and Classification in High Resolution Satellite Data
Advances in spatial and spectral resolution of satellite
images have led to tremendous growth in large image databases. The
data we acquire through satellites, radars, and sensors consists of
important geographical information that can be used for remote
sensing applications such as region planning, disaster management.
Spatial data classification and object recognition are important tasks
for many applications. However, classifying objects and identifying
them manually from images is a difficult task. Object recognition is
often considered as a classification problem, this task can be
performed using machine-learning techniques. Despite of many
machine-learning algorithms, the classification is done using
supervised classifiers such as Support Vector Machines (SVM) as the
area of interest is known. We proposed a classification method,
which considers neighboring pixels in a region for feature extraction
and it evaluates classifications precisely according to neighboring
classes for semantic interpretation of region of interest (ROI). A
dataset has been created for training and testing purpose; we
generated the attributes by considering pixel intensity values and
mean values of reflectance. We demonstrated the benefits of using
knowledge discovery and data-mining techniques, which can be on
image data for accurate information extraction and classification from
high spatial resolution remote sensing imagery.
Mobile Collaboration Learning Technique on Students in Developing Nations
New and more powerful communications technologies
continue to emerge at a rapid pace and their uses in education are
widespread and the impact remarkable in the developing societies.
This study investigates Mobile Collaboration Learning Technique
(MCLT) on learners’ outcome among students in tertiary institutions
of developing nations (a case of Nigeria students). It examines the
significance of retention achievement scores of students taught using
mobile collaboration and conventional method. The sample consisted
of 120 students using Stratified random sampling method. Five
research questions and hypotheses were formulated, and tested at
0.05 level of significance. A student achievement test (SAT) was
made of 40 items of multiple-choice objective type, developed and
validated for data collection by professionals. The SAT was
administered to students as pre-test and post-test. The data were
analyzed using t-test statistic to test the hypotheses. The result
indicated that students taught using MCLT performed significantly
better than their counterparts using the conventional method of
instruction. Also, there was no significant difference in the post-test
performance scores of male and female students taught using MCLT.
Based on the findings, the following submissions was made that:
Mobile collaboration system be encouraged in the institutions to
boost knowledge sharing among learners, workshop and training
should be organized to train teachers on the use of this technique,
schools and government should consistently align curriculum
standard to trends of technological dictates and formulate policies
and procedures towards responsible use of MCLT.
A Comparative Study of Malware Detection Techniques Using Machine Learning Methods
In the past few years, the amount of malicious software
increased exponentially and, therefore, machine learning algorithms
became instrumental in identifying clean and malware files through
(semi)-automated classification. When working with very large
datasets, the major challenge is to reach both a very high malware
detection rate and a very low false positive rate. Another challenge
is to minimize the time needed for the machine learning algorithm to
do so. This paper presents a comparative study between different
machine learning techniques such as linear classifiers, ensembles,
decision trees or various hybrids thereof. The training dataset consists
of approximately 2 million clean files and 200.000 infected files,
which is a realistic quantitative mixture. The paper investigates the
above mentioned methods with respect to both their performance
(detection rate and false positive rate) and their practicability.
Feature-Based Summarizing and Ranking from Customer Reviews
Due to the rapid increase of Internet, web opinion
sources dynamically emerge which is useful for both potential
customers and product manufacturers for prediction and decision
purposes. These are the user generated contents written in natural
languages and are unstructured-free-texts scheme. Therefore, opinion
mining techniques become popular to automatically process customer
reviews for extracting product features and user opinions expressed
over them. Since customer reviews may contain both opinionated and
factual sentences, a supervised machine learning technique applies
for subjectivity classification to improve the mining performance. In
this paper, we dedicate our work is the task of opinion
summarization. Therefore, product feature and opinion extraction is
critical to opinion summarization, because its effectiveness
significantly affects the identification of semantic relationships. The
polarity and numeric score of all the features are determined by
Senti-WordNet Lexicon. The problem of opinion summarization
refers how to relate the opinion words with respect to a certain
feature. Probabilistic based model of supervised learning will
improve the result that is more flexible and effective.
Cross Project Software Fault Prediction at Design Phase
Software fault prediction models are created by using
the source code, processed metrics from the same or previous version
of code and related fault data. Some company do not store and keep
track of all artifacts which are required for software fault prediction.
To construct fault prediction model for such company, the training
data from the other projects can be one potential solution. Earlier we
predicted the fault the less cost it requires to correct. The training
data consists of metrics data and related fault data at function/module
level. This paper investigates fault predictions at early stage using the
cross-project data focusing on the design metrics. In this study,
empirical analysis is carried out to validate design metrics for cross
project fault prediction. The machine learning techniques used for
evaluation is Naïve Bayes. The design phase metrics of other projects
can be used as initial guideline for the projects where no previous
fault data is available. We analyze seven datasets from NASA
Metrics Data Program which offer design as well as code metrics.
Overall, the results of cross project is comparable to the within
company data learning.
What the Future Holds for Social Media Data Analysis
The dramatic rise in the use of Social Media (SM)
platforms such as Facebook and Twitter provide access to an
unprecedented amount of user data. Users may post reviews on
products and services they bought, write about their interests, share
ideas or give their opinions and views on political issues. There is a
growing interest in the analysis of SM data from organisations for
detecting new trends, obtaining user opinions on their products and
services or finding out about their online reputations. A recent
research trend in SM analysis is making predictions based on
sentiment analysis of SM. Often indicators of historic SM data are
represented as time series and correlated with a variety of real world
phenomena like the outcome of elections, the development of
financial indicators, box office revenue and disease outbreaks. This
paper examines the current state of research in the area of SM mining
and predictive analysis and gives an overview of the analysis
methods using opinion mining and machine learning techniques.
Imputation Technique for Feature Selection in Microarray Data Set
Analyzing DNA microarray data sets is a great
challenge, which faces the bioinformaticians due to the complication
of using statistical and machine learning techniques. The challenge
will be doubled if the microarray data sets contain missing data,
which happens regularly because these techniques cannot deal with
missing data. One of the most important data analysis process on
the microarray data set is feature selection. This process finds the
most important genes that affect certain disease. In this paper, we
introduce a technique for imputing the missing data in microarray
data sets while performing feature selection.
A Comprehensive Review on Different Mixed Data Clustering Ensemble Methods
An extensive amount of work has been done in data
clustering research under the unsupervised learning technique in Data
Mining during the past two decades. Moreover, several approaches
and methods have been emerged focusing on clustering diverse data
types, features of cluster models and similarity rates of clusters.
However, none of the single clustering algorithm exemplifies its best
nature in extracting efficient clusters. Consequently, in order to
rectify this issue, a new challenging technique called Cluster
Ensemble method was bloomed. This new approach tends to be the
alternative method for the cluster analysis problem. The main
objective of the Cluster Ensemble is to aggregate the diverse
clustering solutions in such a way to attain accuracy and also to
improve the eminence the individual clustering algorithms. Due to
the massive and rapid development of new methods in the globe of
data mining, it is highly mandatory to scrutinize a vital analysis of
existing techniques and the future novelty. This paper shows the
comparative analysis of different cluster ensemble methods along
with their methodologies and salient features. Henceforth this
unambiguous analysis will be very useful for the society of clustering
experts and also helps in deciding the most appropriate one to resolve
the problem in hand.
A Review: Comparative Analysis of Different Categorical Data Clustering Ensemble Methods
Over the past epoch a rampant amount of work has been done in the data clustering research under the unsupervised learning technique in Data mining. Furthermore several algorithms and methods have been proposed focusing on clustering different data types, representation of cluster models, and accuracy rates of the clusters. However no single clustering algorithm proves to be the most efficient in providing best results. Accordingly in order to find the solution to this issue a new technique, called Cluster ensemble method was bloomed. This cluster ensemble is a good alternative approach for facing the cluster analysis problem. The main hope of the cluster ensemble is to merge different clustering solutions in such a way to achieve accuracy and to improve the quality of individual data clustering. Due to the substantial and unremitting development of new methods in the sphere of data mining and also the incessant interest in inventing new algorithms, makes obligatory to scrutinize a critical analysis of the existing techniques and the future novelty. This paper exposes the comparative study of different cluster ensemble methods along with their features, systematic working process and the average accuracy and error rates of each ensemble methods. Consequently this speculative and comprehensive analysis will be very useful for the community of clustering practitioners and also helps in deciding the most suitable one to rectify the problem in hand.
Analysis of Diverse Clustering Tools in Data Mining
Clustering in data mining is an unsupervised learning technique of aggregating the data objects into meaningful groups such that the intra cluster similarity of objects are maximized and inter cluster similarity of objects are minimized. Over the past decades several clustering tools were emerged in which clustering algorithms are inbuilt and are easier to use and extract the expected results. Data mining mainly deals with the huge databases that inflicts on cluster analysis and additional rigorous computational constraints. These challenges pave the way for the emergence of powerful expansive data mining clustering softwares. In this survey, a variety of clustering tools used in data mining are elucidated along with the pros and cons of each software.
Comparison between Associative Classification and Decision Tree for HCV Treatment Response Prediction
Combined therapy using Interferon and Ribavirin is the standard treatment in patients with chronic hepatitis C. However, the number of responders to this treatment is low, whereas its cost and side effects are high. Therefore, there is a clear need to predict patient’s response to the treatment based on clinical information to protect the patients from the bad drawbacks, Intolerable side effects and waste of money. Different machine learning techniques have been developed to fulfill this purpose. From these techniques are Associative Classification (AC) and Decision Tree (DT). The aim of this research is to compare the performance of these two techniques in the prediction of virological response to the standard treatment of HCV from clinical information. 200 patients treated with Interferon and Ribavirin; were analyzed using AC and DT. 150 cases had been used to train the classifiers and 50 cases had been used to test the classifiers. The experiment results showed that the two techniques had given acceptable results however the best accuracy for the AC reached 92% whereas for DT reached 80%.
Empirical Process Monitoring Via Chemometric Analysis of Partially Unbalanced Data
Real-time or in-line process monitoring frameworks are designed to give early warnings for a fault along with meaningful identification of its assignable causes. In artificial intelligence and machine learning fields of pattern recognition various promising approaches have been proposed such as kernel-based nonlinear machine learning techniques. This work presents a kernel-based empirical monitoring scheme for batch type production processes with small sample size problem of partially unbalanced data. Measurement data of normal operations are easy to collect whilst special events or faults data are difficult to collect. In such situations, noise filtering techniques can be helpful in enhancing process monitoring performance. Furthermore, preprocessing of raw process data is used to get rid of unwanted variation of data. The performance of the monitoring scheme was demonstrated using three-dimensional batch data. The results showed that the monitoring performance was improved significantly in terms of detection success rate of process fault.