International Science Index

59
10011127
A Neuroscience-Based Learning Technique: Framework and Application to STEM
Abstract:

Existing learning techniques such as problem-based learning, project-based learning, or case study learning are learning techniques that focus mainly on technical details, but give no specific guidelines on learner’s experience and emotional learning aspects such as arousal salience and valence, being emotional states important factors affecting engagement and retention. Some approaches involving emotion in educational settings, such as social and emotional learning, lack neuroscientific rigorousness and use of specific neurobiological mechanisms. On the other hand, neurobiology approaches lack educational applicability. And educational approaches mainly focus on cognitive aspects and disregard conditioning learning. First, authors start explaining the reasons why it is hard to learn thoughtfully, then they use the method of neurobiological mapping to track the main limbic system functions, such as the reward circuit, and its relations with perception, memories, motivations, sympathetic and parasympathetic reactions, and sensations, as well as the brain cortex. The authors conclude explaining the major finding: The mechanisms of nonconscious learning and the triggers that guarantee long-term memory potentiation. Afterward, the educational framework for practical application and the instructors’ guidelines are established. An implementation example in engineering education is given, namely, the study of tuned-mass dampers for earthquake oscillations attenuation in skyscrapers. This work represents an original learning technique based on nonconscious learning mechanisms to enhance long-term memories that complement existing cognitive learning methods.

Paper Detail
35
downloads
58
10010886
Online Graduate Students’ Perspective on Engagement in Active Learning in the United States
Abstract:
As of 2017, many researchers in educational journals are still wondering if students are effectively and efficiently engaged in active learning in the online learning environment. The goal of this qualitative single case study and narrative research is to explore if students are actively engaged in their online learning. Seven online students in the United States from LinkedIn and residencies were interviewed for this study. Eleven online learning techniques from research were used as a framework.  Data collection tools were used for the study that included a digital audiotape, observation sheet, interview protocol, transcription, and NVivo 12 Plus qualitative software.  Data analysis process, member checking, and key themes were used to reach saturation. About 85.7% of students preferred individual grading. About 71.4% of students valued professor’s interacting 2-3 times weekly, participating through posts and responses, having good internet access, and using email.  Also, about 57.1% said students log in 2-3 times weekly to daily, professor’s social presence helps, regular punctuality in work submission, and prefer assessments style of research, essay, and case study.  About 42.9% appreciated syllabus usefulness and professor’s expertise.
Paper Detail
99
downloads
57
10010842
An Automated Stock Investment System Using Machine Learning Techniques: An Application in Australia
Abstract:

A key issue in stock investment is how to select representative features for stock selection. The objective of this paper is to firstly determine whether an automated stock investment system, using machine learning techniques, may be used to identify a portfolio of growth stocks that are highly likely to provide returns better than the stock market index. The second objective is to identify the technical features that best characterize whether a stock’s price is likely to go up and to identify the most important factors and their contribution to predicting the likelihood of the stock price going up. Unsupervised machine learning techniques, such as cluster analysis, were applied to the stock data to identify a cluster of stocks that was likely to go up in price – portfolio 1. Next, the principal component analysis technique was used to select stocks that were rated high on component one and component two – portfolio 2. Thirdly, a supervised machine learning technique, the logistic regression method, was used to select stocks with a high probability of their price going up – portfolio 3. The predictive models were validated with metrics such as, sensitivity (recall), specificity and overall accuracy for all models. All accuracy measures were above 70%. All portfolios outperformed the market by more than eight times. The top three stocks were selected for each of the three stock portfolios and traded in the market for one month. After one month the return for each stock portfolio was computed and compared with the stock market index returns. The returns for all three stock portfolios was 23.87% for the principal component analysis stock portfolio, 11.65% for the logistic regression portfolio and 8.88% for the K-means cluster portfolio while the stock market performance was 0.38%. This study confirms that an automated stock investment system using machine learning techniques can identify top performing stock portfolios that outperform the stock market.

Paper Detail
200
downloads
56
10010605
Classification Based on Deep Neural Cellular Automata Model
Abstract:
Deep learning structure is a branch of machine learning science and greet achievement in research and applications. Cellular neural networks are regarded as array of nonlinear analog processors called cells connected in a way allowing parallel computations. The paper discusses how to use deep learning structure for representing neural cellular automata model. The proposed learning technique in cellular automata model will be examined from structure of deep learning. A deep automata neural cellular system modifies each neuron based on the behavior of the individual and its decision as a result of multi-level deep structure learning. The paper will present the architecture of the model and the results of simulation of approach are given. Results from the implementation enrich deep neural cellular automata system and shed a light on concept formulation of the model and the learning in it.
Paper Detail
181
downloads
55
10010270
Improved Feature Extraction Technique for Handling Occlusion in Automatic Facial Expression Recognition
Abstract:

The field of automatic facial expression analysis has been an active research area in the last two decades. Its vast applicability in various domains has drawn so much attention into developing techniques and dataset that mirror real life scenarios. Many techniques such as Local Binary Patterns and its variants (CLBP, LBP-TOP) and lately, deep learning techniques, have been used for facial expression recognition. However, the problem of occlusion has not been sufficiently handled, making their results not applicable in real life situations. This paper develops a simple, yet highly efficient method tagged Local Binary Pattern-Histogram of Gradient (LBP-HOG) with occlusion detection in face image, using a multi-class SVM for Action Unit and in turn expression recognition. Our method was evaluated on three publicly available datasets which are JAFFE, CK, SFEW. Experimental results showed that our approach performed considerably well when compared with state-of-the-art algorithms and gave insight to occlusion detection as a key step to handling expression in wild.

Paper Detail
273
downloads
54
10009384
Machine Learning Methods for Network Intrusion Detection
Abstract:

Network security engineers work to keep services available all the time by handling intruder attacks. Intrusion Detection System (IDS) is one of the obtainable mechanisms that is used to sense and classify any abnormal actions. Therefore, the IDS must be always up to date with the latest intruder attacks signatures to preserve confidentiality, integrity, and availability of the services. The speed of the IDS is a very important issue as well learning the new attacks. This research work illustrates how the Knowledge Discovery and Data Mining (or Knowledge Discovery in Databases) KDD dataset is very handy for testing and evaluating different Machine Learning Techniques. It mainly focuses on the KDD preprocess part in order to prepare a decent and fair experimental data set. The J48, MLP, and Bayes Network classifiers have been chosen for this study. It has been proven that the J48 classifier has achieved the highest accuracy rate for detecting and classifying all KDD dataset attacks, which are of type DOS, R2L, U2R, and PROBE.

Keywords:
Paper Detail
273
downloads
53
10008916
Off-Policy Q-learning Technique for Intrusion Response in Network Security
Abstract:
With the increasing dependency on our computer devices, we face the necessity of adequate, efficient and effective mechanisms, for protecting our network. There are two main problems that Intrusion Detection Systems (IDS) attempt to solve. 1) To detect the attack, by analyzing the incoming traffic and inspect the network (intrusion detection). 2) To produce a prompt response when the attack occurs (intrusion prevention). It is critical creating an Intrusion detection model that will detect a breach in the system on time and also challenging making it provide an automatic and with an acceptable delay response at every single stage of the monitoring process. We cannot afford to adopt security measures with a high exploiting computational power, and we are not able to accept a mechanism that will react with a delay. In this paper, we will propose an intrusion response mechanism that is based on artificial intelligence, and more precisely, reinforcement learning techniques (RLT). The RLT will help us to create a decision agent, who will control the process of interacting with the undetermined environment. The goal is to find an optimal policy, which will represent the intrusion response, therefore, to solve the Reinforcement learning problem, using a Q-learning approach. Our agent will produce an optimal immediate response, in the process of evaluating the network traffic.This Q-learning approach will establish the balance between exploration and exploitation and provide a unique, self-learning and strategic artificial intelligence response mechanism for IDS.
Paper Detail
505
downloads
52
10008539
Evaluating Machine Learning Techniques for Activity Classification in Smart Home Environments
Abstract:
With the widespread adoption of the Internet-connected devices, and with the prevalence of the Internet of Things (IoT) applications, there is an increased interest in machine learning techniques that can provide useful and interesting services in the smart home domain. The areas that machine learning techniques can help advance are varied and ever-evolving. Classifying smart home inhabitants’ Activities of Daily Living (ADLs), is one prominent example. The ability of machine learning technique to find meaningful spatio-temporal relations of high-dimensional data is an important requirement as well. This paper presents a comparative evaluation of state-of-the-art machine learning techniques to classify ADLs in the smart home domain. Forty-two synthetic datasets and two real-world datasets with multiple inhabitants are used to evaluate and compare the performance of the identified machine learning techniques. Our results show significant performance differences between the evaluated techniques. Such as AdaBoost, Cortical Learning Algorithm (CLA), Decision Trees, Hidden Markov Model (HMM), Multi-layer Perceptron (MLP), Structured Perceptron and Support Vector Machines (SVM). Overall, neural network based techniques have shown superiority over the other tested techniques.
Paper Detail
1123
downloads
51
10008482
An Intelligent Baby Care System Based on IoT and Deep Learning Techniques
Abstract:
Due to the heavy burden and pressure of caring for infants, an integrated automatic baby watching system based on IoT smart sensing and deep learning machine vision techniques is proposed in this paper. By monitoring infant body conditions such as heartbeat, breathing, body temperature, sleeping posture, as well as the surrounding conditions such as dangerous/sharp objects, light, noise, humidity and temperature, the proposed system can analyze and predict the obvious/potential dangerous conditions according to observed data and then adopt suitable actions in real time to protect the infant from harm. Thus, reducing the burden of the caregiver and improving safety efficiency of the caring work. The experimental results show that the proposed system works successfully for the infant care work and thus can be implemented in various life fields practically.
Paper Detail
954
downloads
50
10007029
Feature Selection and Predictive Modeling of Housing Data Using Random Forest
Abstract:

Predictive data analysis and modeling involving machine learning techniques become challenging in presence of too many explanatory variables or features. Presence of too many features in machine learning is known to not only cause algorithms to slow down, but they can also lead to decrease in model prediction accuracy. This study involves housing dataset with 79 quantitative and qualitative features that describe various aspects people consider while buying a new house. Boruta algorithm that supports feature selection using a wrapper approach build around random forest is used in this study. This feature selection process leads to 49 confirmed features which are then used for developing predictive random forest models. The study also explores five different data partitioning ratios and their impact on model accuracy are captured using coefficient of determination (r-square) and root mean square error (rsme).

Paper Detail
1174
downloads
49
10006955
The Effect of Cooperative Learning on Academic Achievement of Grade Nine Students in Mathematics: The Case of Mettu Secondary and Preparatory School
Abstract:
The aim of this study was to examine the effect of cooperative learning method on student’s academic achievement and on the achievement level over a usual method in teaching different topics of mathematics. The study also examines the perceptions of students towards cooperative learning. Cooperative learning is the instructional strategy in which pairs or small groups of students with different levels of ability work together to accomplish a shared goal. The aim of this cooperation is for students to maximize their own and each other learning, with members striving for joint benefit. The teacher’s role changes from wise on the wise to guide on the side. Cooperative learning due to its influential aspects is the most prevalent teaching-learning technique in the modern world. Therefore the study was conducted in order to examine the effect of cooperative learning on the academic achievement of grade 9 students in Mathematics in case of Mettu secondary school. Two sample sections are randomly selected by which one section served randomly as an experimental and the other as a comparison group. Data gathering instruments are achievement tests and questionnaires. A treatment of STAD method of cooperative learning was provided to the experimental group while the usual method is used in the comparison group. The experiment lasted for one semester. To determine the effect of cooperative learning on the student’s academic achievement, the significance of difference between the scores of groups at 0.05 levels was tested by applying t test. The effect size was calculated to see the strength of the treatment. The student’s perceptions about the method were tested by percentiles of the questionnaires. During data analysis, each group was divided into high and low achievers on basis of their previous Mathematics result. Data analysis revealed that both the experimental and comparison groups were almost equal in Mathematics at the beginning of the experiment. The experimental group out scored significantly than comparison group on posttest. Additionally, the comparison of mean posttest scores of high achievers indicates significant difference between the two groups. The same is true for low achiever students of both groups on posttest. Hence, the result of the study indicates the effectiveness of the method for Mathematics topics as compared to usual method of teaching.
Paper Detail
1366
downloads
48
10007235
Prediction-Based Midterm Operation Planning for Energy Management of Exhibition Hall
Abstract:

Large exhibition halls require a lot of energy to maintain comfortable atmosphere for the visitors viewing inside. One way of reducing the energy cost is to have thermal energy storage systems installed so that the thermal energy can be stored in the middle of night when the energy price is low and then used later when the price is high. To minimize the overall energy cost, however, we should be able to decide how much energy to save during which time period exactly. If we can foresee future energy load and the corresponding cost, we will be able to make such decisions reasonably. In this paper, we use machine learning technique to obtain models for predicting weather conditions and the number of visitors on hourly basis for the next day. Based on the energy load thus predicted, we build a cost-optimal daily operation plan for the thermal energy storage systems and cooling and heating facilities through simulation-based optimization.

Paper Detail
546
downloads
47
10005809
Hand Gesture Interpretation Using Sensing Glove Integrated with Machine Learning Algorithms
Abstract:
In this paper, we present a low cost design for a smart glove that can perform sign language recognition to assist the speech impaired people. Specifically, we have designed and developed an Assistive Hand Gesture Interpreter that recognizes hand movements relevant to the American Sign Language (ASL) and translates them into text for display on a Thin-Film-Transistor Liquid Crystal Display (TFT LCD) screen as well as synthetic speech. Linear Bayes Classifiers and Multilayer Neural Networks have been used to classify 11 feature vectors obtained from the sensors on the glove into one of the 27 ASL alphabets and a predefined gesture for space. Three types of features are used; bending using six bend sensors, orientation in three dimensions using accelerometers and contacts at vital points using contact sensors. To gauge the performance of the presented design, the training database was prepared using five volunteers. The accuracy of the current version on the prepared dataset was found to be up to 99.3% for target user. The solution combines electronics, e-textile technology, sensor technology, embedded system and machine learning techniques to build a low cost wearable glove that is scrupulous, elegant and portable.
Paper Detail
2219
downloads
46
10005939
Context Detection in Spreadsheets Based on Automatically Inferred Table Schema
Abstract:
Programming requires years of training. With natural language and end user development methods, programming could become available to everyone. It enables end users to program their own devices and extend the functionality of the existing system without any knowledge of programming languages. In this paper, we describe an Interactive Spreadsheet Processing Module (ISPM), a natural language interface to spreadsheets that allows users to address ranges within the spreadsheet based on inferred table schema. Using the ISPM, end users are able to search for values in the schema of the table and to address the data in spreadsheets implicitly. Furthermore, it enables them to select and sort the spreadsheet data by using natural language. ISPM uses a machine learning technique to automatically infer areas within a spreadsheet, including different kinds of headers and data ranges. Since ranges can be identified from natural language queries, the end users can query the data using natural language. During the evaluation 12 undergraduate students were asked to perform operations (sum, sort, group and select) using the system and also Excel without ISPM interface, and the time taken for task completion was compared across the two systems. Only for the selection task did users take less time in Excel (since they directly selected the cells using the mouse) than in ISPM, by using natural language for end user software engineering, to overcome the present bottleneck of professional developers.
Paper Detail
676
downloads
45
10005186
Mining User-Generated Contents to Detect Service Failures with Topic Model
Abstract:
Online user-generated contents (UGC) significantly change the way customers behave (e.g., shop, travel), and a pressing need to handle the overwhelmingly plethora amount of various UGC is one of the paramount issues for management. However, a current approach (e.g., sentiment analysis) is often ineffective for leveraging textual information to detect the problems or issues that a certain management suffers from. In this paper, we employ text mining of Latent Dirichlet Allocation (LDA) on a popular online review site dedicated to complaint from users. We find that the employed LDA efficiently detects customer complaints, and a further inspection with the visualization technique is effective to categorize the problems or issues. As such, management can identify the issues at stake and prioritize them accordingly in a timely manner given the limited amount of resources. The findings provide managerial insights into how analytics on social media can help maintain and improve their reputation management. Our interdisciplinary approach also highlights several insights by applying machine learning techniques in marketing research domain. On a broader technical note, this paper illustrates the details of how to implement LDA in R program from a beginning (data collection in R) to an end (LDA analysis in R) since the instruction is still largely undocumented. In this regard, it will help lower the boundary for interdisciplinary researcher to conduct related research.
Paper Detail
857
downloads
44
10004850
Case-Based Reasoning: A Hybrid Classification Model Improved with an Expert's Knowledge for High-Dimensional Problems
Abstract:

Data mining and classification of objects is the process of data analysis, using various machine learning techniques, which is used today in various fields of research. This paper presents a concept of hybrid classification model improved with the expert knowledge. The hybrid model in its algorithm has integrated several machine learning techniques (Information Gain, K-means, and Case-Based Reasoning) and the expert’s knowledge into one. The knowledge of experts is used to determine the importance of features. The paper presents the model algorithm and the results of the case study in which the emphasis was put on achieving the maximum classification accuracy without reducing the number of features.

Paper Detail
1015
downloads
43
10004012
An Empirical Evaluation of Performance of Machine Learning Techniques on Imbalanced Software Quality Data
Abstract:
The development of change prediction models can help the software practitioners in planning testing and inspection resources at early phases of software development. However, a major challenge faced during the training process of any classification model is the imbalanced nature of the software quality data. A data with very few minority outcome categories leads to inefficient learning process and a classification model developed from the imbalanced data generally does not predict these minority categories correctly. Thus, for a given dataset, a minority of classes may be change prone whereas a majority of classes may be non-change prone. This study explores various alternatives for adeptly handling the imbalanced software quality data using different sampling methods and effective MetaCost learners. The study also analyzes and justifies the use of different performance metrics while dealing with the imbalanced data. In order to empirically validate different alternatives, the study uses change data from three application packages of open-source Android data set and evaluates the performance of six different machine learning techniques. The results of the study indicate extensive improvement in the performance of the classification models when using resampling method and robust performance measures.
Paper Detail
1146
downloads
42
10003512
Anomaly Detection with ANN and SVM for Telemedicine Networks
Abstract:
In recent years, a wide variety of applications are developed with Support Vector Machines -SVM- methods and Artificial Neural Networks -ANN-. In general, these methods depend on intrusion knowledge databases such as KDD99, ISCX, and CAIDA among others. New classes of detectors are generated by machine learning techniques, trained and tested over network databases. Thereafter, detectors are employed to detect anomalies in network communication scenarios according to user’s connections behavior. The first detector based on training dataset is deployed in different real-world networks with mobile and non-mobile devices to analyze the performance and accuracy over static detection. The vulnerabilities are based on previous work in telemedicine apps that were developed on the research group. This paper presents the differences on detections results between some network scenarios by applying traditional detectors deployed with artificial neural networks and support vector machines.
Paper Detail
1423
downloads
41
10003268
A Supervised Learning Data Mining Approach for Object Recognition and Classification in High Resolution Satellite Data
Abstract:
Advances in spatial and spectral resolution of satellite images have led to tremendous growth in large image databases. The data we acquire through satellites, radars, and sensors consists of important geographical information that can be used for remote sensing applications such as region planning, disaster management. Spatial data classification and object recognition are important tasks for many applications. However, classifying objects and identifying them manually from images is a difficult task. Object recognition is often considered as a classification problem, this task can be performed using machine-learning techniques. Despite of many machine-learning algorithms, the classification is done using supervised classifiers such as Support Vector Machines (SVM) as the area of interest is known. We proposed a classification method, which considers neighboring pixels in a region for feature extraction and it evaluates classifications precisely according to neighboring classes for semantic interpretation of region of interest (ROI). A dataset has been created for training and testing purpose; we generated the attributes by considering pixel intensity values and mean values of reflectance. We demonstrated the benefits of using knowledge discovery and data-mining techniques, which can be on image data for accurate information extraction and classification from high spatial resolution remote sensing imagery.
Paper Detail
1625
downloads
40
10002606
Mobile Collaboration Learning Technique on Students in Developing Nations
Abstract:
New and more powerful communications technologies continue to emerge at a rapid pace and their uses in education are widespread and the impact remarkable in the developing societies. This study investigates Mobile Collaboration Learning Technique (MCLT) on learners’ outcome among students in tertiary institutions of developing nations (a case of Nigeria students). It examines the significance of retention achievement scores of students taught using mobile collaboration and conventional method. The sample consisted of 120 students using Stratified random sampling method. Five research questions and hypotheses were formulated, and tested at 0.05 level of significance. A student achievement test (SAT) was made of 40 items of multiple-choice objective type, developed and validated for data collection by professionals. The SAT was administered to students as pre-test and post-test. The data were analyzed using t-test statistic to test the hypotheses. The result indicated that students taught using MCLT performed significantly better than their counterparts using the conventional method of instruction. Also, there was no significant difference in the post-test performance scores of male and female students taught using MCLT. Based on the findings, the following submissions was made that: Mobile collaboration system be encouraged in the institutions to boost knowledge sharing among learners, workshop and training should be organized to train teachers on the use of this technique, schools and government should consistently align curriculum standard to trends of technological dictates and formulate policies and procedures towards responsible use of MCLT.
Paper Detail
1509
downloads
39
10001357
A Comparative Study of Malware Detection Techniques Using Machine Learning Methods
Abstract:
In the past few years, the amount of malicious software increased exponentially and, therefore, machine learning algorithms became instrumental in identifying clean and malware files through (semi)-automated classification. When working with very large datasets, the major challenge is to reach both a very high malware detection rate and a very low false positive rate. Another challenge is to minimize the time needed for the machine learning algorithm to do so. This paper presents a comparative study between different machine learning techniques such as linear classifiers, ensembles, decision trees or various hybrids thereof. The training dataset consists of approximately 2 million clean files and 200.000 infected files, which is a realistic quantitative mixture. The paper investigates the above mentioned methods with respect to both their performance (detection rate and false positive rate) and their practicability.
Paper Detail
2682
downloads
38
10000873
Feature-Based Summarizing and Ranking from Customer Reviews
Abstract:

Due to the rapid increase of Internet, web opinion sources dynamically emerge which is useful for both potential customers and product manufacturers for prediction and decision purposes. These are the user generated contents written in natural languages and are unstructured-free-texts scheme. Therefore, opinion mining techniques become popular to automatically process customer reviews for extracting product features and user opinions expressed over them. Since customer reviews may contain both opinionated and factual sentences, a supervised machine learning technique applies for subjectivity classification to improve the mining performance. In this paper, we dedicate our work is the task of opinion summarization. Therefore, product feature and opinion extraction is critical to opinion summarization, because its effectiveness significantly affects the identification of semantic relationships. The polarity and numeric score of all the features are determined by Senti-WordNet Lexicon. The problem of opinion summarization refers how to relate the opinion words with respect to a certain feature. Probabilistic based model of supervised learning will improve the result that is more flexible and effective.

Paper Detail
2411
downloads
37
10001596
Cross Project Software Fault Prediction at Design Phase
Abstract:
Software fault prediction models are created by using the source code, processed metrics from the same or previous version of code and related fault data. Some company do not store and keep track of all artifacts which are required for software fault prediction. To construct fault prediction model for such company, the training data from the other projects can be one potential solution. Earlier we predicted the fault the less cost it requires to correct. The training data consists of metrics data and related fault data at function/module level. This paper investigates fault predictions at early stage using the cross-project data focusing on the design metrics. In this study, empirical analysis is carried out to validate design metrics for cross project fault prediction. The machine learning techniques used for evaluation is Naïve Bayes. The design phase metrics of other projects can be used as initial guideline for the projects where no previous fault data is available. We analyze seven datasets from NASA Metrics Data Program which offer design as well as code metrics. Overall, the results of cross project is comparable to the within company data learning.
Paper Detail
2073
downloads
36
10000096
What the Future Holds for Social Media Data Analysis
Abstract:

The dramatic rise in the use of Social Media (SM) platforms such as Facebook and Twitter provide access to an unprecedented amount of user data. Users may post reviews on products and services they bought, write about their interests, share ideas or give their opinions and views on political issues. There is a growing interest in the analysis of SM data from organisations for detecting new trends, obtaining user opinions on their products and services or finding out about their online reputations. A recent research trend in SM analysis is making predictions based on sentiment analysis of SM. Often indicators of historic SM data are represented as time series and correlated with a variety of real world phenomena like the outcome of elections, the development of financial indicators, box office revenue and disease outbreaks. This paper examines the current state of research in the area of SM mining and predictive analysis and gives an overview of the analysis methods using opinion mining and machine learning techniques.

Paper Detail
3447
downloads
35
10000619
Imputation Technique for Feature Selection in Microarray Data Set
Abstract:

Analyzing DNA microarray data sets is a great challenge, which faces the bioinformaticians due to the complication of using statistical and machine learning techniques. The challenge will be doubled if the microarray data sets contain missing data, which happens regularly because these techniques cannot deal with missing data. One of the most important data analysis process on the microarray data set is feature selection. This process finds the most important genes that affect certain disease. In this paper, we introduce a technique for imputing the missing data in microarray data sets while performing feature selection.

Paper Detail
2070
downloads
34
9999281
A Comprehensive Review on Different Mixed Data Clustering Ensemble Methods
Abstract:

An extensive amount of work has been done in data clustering research under the unsupervised learning technique in Data Mining during the past two decades. Moreover, several approaches and methods have been emerged focusing on clustering diverse data types, features of cluster models and similarity rates of clusters. However, none of the single clustering algorithm exemplifies its best nature in extracting efficient clusters. Consequently, in order to rectify this issue, a new challenging technique called Cluster Ensemble method was bloomed. This new approach tends to be the alternative method for the cluster analysis problem. The main objective of the Cluster Ensemble is to aggregate the diverse clustering solutions in such a way to attain accuracy and also to improve the eminence the individual clustering algorithms. Due to the massive and rapid development of new methods in the globe of data mining, it is highly mandatory to scrutinize a vital analysis of existing techniques and the future novelty. This paper shows the comparative analysis of different cluster ensemble methods along with their methodologies and salient features. Henceforth this unambiguous analysis will be very useful for the society of clustering experts and also helps in deciding the most appropriate one to resolve the problem in hand.

Paper Detail
1704
downloads
33
9997172
A Review: Comparative Analysis of Different Categorical Data Clustering Ensemble Methods
Abstract:

Over the past epoch a rampant amount of work has been done in the data clustering research under the unsupervised learning technique in Data mining. Furthermore several algorithms and methods have been proposed focusing on clustering different data types, representation of cluster models, and accuracy rates of the clusters. However no single clustering algorithm proves to be the most efficient in providing best results. Accordingly in order to find the solution to this issue a new technique, called Cluster ensemble method was bloomed. This cluster ensemble is a good alternative approach for facing the cluster analysis problem. The main hope of the cluster ensemble is to merge different clustering solutions in such a way to achieve accuracy and to improve the quality of individual data clustering. Due to the substantial and unremitting development of new methods in the sphere of data mining and also the incessant interest in inventing new algorithms, makes obligatory to scrutinize a critical analysis of the existing techniques and the future novelty. This paper exposes the comparative study of different cluster ensemble methods along with their features, systematic working process and the average accuracy and error rates of each ensemble methods. Consequently this speculative and comprehensive analysis will be very useful for the community of clustering practitioners and also helps in deciding the most suitable one to rectify the problem in hand.

Paper Detail
2097
downloads
32
9997173
Analysis of Diverse Clustering Tools in Data Mining
Abstract:

Clustering in data mining is an unsupervised learning technique of aggregating the data objects into meaningful groups such that the intra cluster similarity of objects are maximized and inter cluster similarity of objects are minimized. Over the past decades several clustering tools were emerged in which clustering algorithms are inbuilt and are easier to use and extract the expected results. Data mining mainly deals with the huge databases that inflicts on cluster analysis and additional rigorous computational constraints. These challenges pave the way for the emergence of powerful expansive data mining clustering softwares. In this survey, a variety of clustering tools used in data mining are elucidated along with the pros and cons of each software.

Paper Detail
1754
downloads
31
17322
Comparison between Associative Classification and Decision Tree for HCV Treatment Response Prediction
Abstract:

Combined therapy using Interferon and Ribavirin is the standard treatment in patients with chronic hepatitis C. However, the number of responders to this treatment is low, whereas its cost and side effects are high. Therefore, there is a clear need to predict patient’s response to the treatment based on clinical information to protect the patients from the bad drawbacks, Intolerable side effects and waste of money. Different machine learning techniques have been developed to fulfill this purpose. From these techniques are Associative Classification (AC) and Decision Tree (DT). The aim of this research is to compare the performance of these two techniques in the prediction of virological response to the standard treatment of HCV from clinical information. 200 patients treated with Interferon and Ribavirin; were analyzed using AC and DT. 150 cases had been used to train the classifiers and 50 cases had been used to test the classifiers. The experiment results showed that the two techniques had given acceptable results however the best accuracy for the AC reached 92% whereas for DT reached 80%.

Paper Detail
1557
downloads
30
17386
Empirical Process Monitoring Via Chemometric Analysis of Partially Unbalanced Data
Authors:
Abstract:

Real-time or in-line process monitoring frameworks are designed to give early warnings for a fault along with meaningful identification of its assignable causes. In artificial intelligence and machine learning fields of pattern recognition various promising approaches have been proposed such as kernel-based nonlinear machine learning techniques. This work presents a kernel-based empirical monitoring scheme for batch type production processes with small sample size problem of partially unbalanced data. Measurement data of normal operations are easy to collect whilst special events or faults data are difficult to collect. In such situations, noise filtering techniques can be helpful in enhancing process monitoring performance. Furthermore, preprocessing of raw process data is used to get rid of unwanted variation of data. The performance of the monitoring scheme was demonstrated using three-dimensional batch data. The results showed that the monitoring performance was improved significantly in terms of detection success rate of process fault.

Paper Detail
1084
downloads