International Science Index
Machine Learning Techniques in Bank Credit Analysis
The aim of this paper is to compare and discuss better classifier algorithm options for credit risk assessment by applying different Machine Learning techniques. Using records from a Brazilian financial institution, this study uses a database of 5,432 companies that are clients of the bank, where 2,600 clients are classified as non-defaulters, 1,551 are classified as defaulters and 1,281 are temporarily defaulters, meaning that the clients are overdue on their payments for up 180 days. For each case, a total of 15 attributes was considered for a one-against-all assessment using four different techniques: Artificial Neural Networks Multilayer Perceptron (ANN-MLP), Artificial Neural Networks Radial Basis Functions (ANN-RBF), Logistic Regression (LR) and finally Support Vector Machines (SVM). For each method, different parameters were analyzed in order to obtain different results when the best of each technique was compared. Initially the data were coded in thermometer code (numerical attributes) or dummy coding (for nominal attributes). The methods were then evaluated for each parameter and the best result of each technique was compared in terms of accuracy, false positives, false negatives, true positives and true negatives. This comparison showed that the best method, in terms of accuracy, was ANN-RBF (79.20% for non-defaulter classification, 97.74% for defaulters and 75.37% for the temporarily defaulter classification). However, the best accuracy does not always represent the best technique. For instance, on the classification of temporarily defaulters, this technique, in terms of false positives, was surpassed by SVM, which had the lowest rate (0.07%) of false positive classifications. All these intrinsic details are discussed considering the results found, and an overview of what was presented is shown in the conclusion of this study.
Multiclass Support Vector Machines with Simultaneous Multi-Factors Optimization for Corporate Credit Ratings
Corporate credit rating prediction is one of the most important topics, which has been studied by researchers in the last decade. Over the last decade, researchers are pushing the limit to enhance the exactness of the corporate credit rating prediction model by applying several data-driven tools including statistical and artificial intelligence methods. Among them, multiclass support vector machine (MSVM) has been widely applied due to its good predictability. However, heuristics, for example, parameters of a kernel function, appropriate feature and instance subset, has become the main reason for the critics on MSVM, as they have dictate the MSVM architectural variables. This study presents a hybrid MSVM model that is intended to optimize all the parameter such as feature selection, instance selection, and kernel parameter. Our model adopts genetic algorithm (GA) to simultaneously optimize multiple heterogeneous design factors of MSVM.
Anomaly Detection with ANN and SVM for Telemedicine Networks
In recent years, a wide variety of applications are developed with Support Vector Machines -SVM- methods and Artificial Neural Networks -ANN-. In general, these methods depend on intrusion knowledge databases such as KDD99, ISCX, and CAIDA among others. New classes of detectors are generated by machine learning techniques, trained and tested over network databases. Thereafter, detectors are employed to detect anomalies in network communication scenarios according to user’s connections behavior. The first detector based on training dataset is deployed in different real-world networks with mobile and non-mobile devices to analyze the performance and accuracy over static detection. The vulnerabilities are based on previous work in telemedicine apps that were developed on the research group. This paper presents the differences on detections results between some network scenarios by applying traditional detectors deployed with artificial neural networks and support vector machines.
Methods of Geodesic Distance in Two-Dimensional Face Recognition
In this paper, we present a comparative study of three
methods of 2D face recognition system such as: Iso-Geodesic Curves
(IGC), Geodesic Distance (GD) and Geodesic-Intensity Histogram
(GIH). These approaches are based on computing of geodesic
distance between points of facial surface and between facial curves.
In this study we represented the image at gray level as a 2D surface in
a 3D space, with the third coordinate proportional to the intensity
values of pixels. In the classifying step, we use: Neural Networks
(NN), K-Nearest Neighbor (KNN) and Support Vector Machines
(SVM). The images used in our experiments are from two wellknown
databases of face images ORL and YaleB. ORL data base was
used to evaluate the performance of methods under conditions where
the pose and sample size are varied, and the database YaleB was used
to examine the performance of the systems when the facial
expressions and lighting are varied.
DWT Based Image Steganalysis
‘Steganalysis’ is one of the challenging and attractive interests for the researchers with the development of information hiding techniques. It is the procedure to detect the hidden information from the stego created by known steganographic algorithm. In this paper, a novel feature based image steganalysis technique is proposed. Various statistical moments have been used along with some similarity metric. The proposed steganalysis technique has been designed based on transformation in four wavelet domains, which include Haar, Daubechies, Symlets and Biorthogonal. Each domain is being subjected to various classifiers, namely K-nearest-neighbor, K* Classifier, Locally weighted learning, Naive Bayes classifier, Neural networks, Decision trees and Support vector machines. The experiments are performed on a large set of pictures which are available freely in image database. The system also predicts the different message length definitions.
Support Vector Machines Approach for Detecting the Mean Shifts in Hotelling-s T2 Control Chart with Sensitizing Rules
In many industries, control charts is one of the most
frequently used tools for quality management. Hotelling-s T2 is used
widely in multivariate control chart. However, it has little defect when
detecting small or medium process shifts. The use of supplementary
sensitizing rules can improve the performance of detection. This study
applied sensitizing rules for Hotelling-s T2 control chart to improve the
performance of detection. Support vector machines (SVM) classifier
to identify the characteristic or group of characteristics that are
responsible for the signal and to classify the magnitude of the mean
shifts. The experimental results demonstrate that the support vector
machines (SVM) classifier can effectively identify the characteristic
or group of characteristics that caused the process mean shifts and the
magnitude of the shifts.
Intrusion Detection Using a New Particle Swarm Method and Support Vector Machines
Intrusion detection is a mechanism used to protect a
system and analyse and predict the behaviours of system users. An
ideal intrusion detection system is hard to achieve due to
nonlinearity, and irrelevant or redundant features. This study
introduces a new anomaly-based intrusion detection model. The
suggested model is based on particle swarm optimisation and
nonlinear, multi-class and multi-kernel support vector machines.
Particle swarm optimisation is used for feature selection by applying
a new formula to update the position and the velocity of a particle;
the support vector machine is used as a classifier. The proposed
model is tested and compared with the other methods using the KDD
CUP 1999 dataset. The results indicate that this new method achieves
better accuracy rates than previous methods.
Analysis of Palm Perspiration Effect with SVM for Diabetes in People
In this research, the diabetes conditions of people (healthy, prediabete and diabete) were tried to be identified with noninvasive palm perspiration measurements. Data clusters gathered from 200 subjects were used (1.Individual Attributes Cluster and 2. Palm Perspiration Attributes Cluster). To decrase the dimensions of these data clusters, Principal Component Analysis Method was used. Data clusters, prepared in that way, were classified with Support Vector Machines. Classifications with highest success were 82% for Glucose parameters and 84% for HbA1c parametres.
Clinical Decision Support for Disease Classification based on the Tests Association
Until recently, researchers have developed various
tools and methodologies for effective clinical decision-making.
Among those decisions, chest pain diseases have been one of
important diagnostic issues especially in an emergency department. To
improve the ability of physicians in diagnosis, many researchers have
developed diagnosis intelligence by using machine learning and data
mining. However, most of the conventional methodologies have been
generally based on a single classifier for disease classification and
prediction, which shows moderate performance. This study utilizes an
ensemble strategy to combine multiple different classifiers to help
physicians diagnose chest pain diseases more accurately than ever.
Specifically the ensemble strategy is applied by using the integration
of decision trees, neural networks, and support vector machines. The
ensemble models are applied to real-world emergency data. This study
shows that the performance of the ensemble models is superior to each
of single classifiers.
A Comparison of Different Soft Computing Models for Credit Scoring
It has become crucial over the years for nations to
improve their credit scoring methods and techniques in light of the
increasing volatility of the global economy. Statistical methods or
tools have been the favoured means for this; however artificial
intelligence or soft computing based techniques are becoming
increasingly preferred due to their proficient and precise nature and
relative simplicity. This work presents a comparison between Support
Vector Machines and Artificial Neural Networks two popular soft
computing models when applied to credit scoring. Amidst the
different criteria-s that can be used for comparisons; accuracy,
computational complexity and processing times are the selected
criteria used to evaluate both models. Furthermore the German credit
scoring dataset which is a real world dataset is used to train and test
both developed models. Experimental results obtained from our study
suggest that although both soft computing models could be used with
a high degree of accuracy, Artificial Neural Networks deliver better
results than Support Vector Machines.
Identification of Printed Punjabi Words and English Numerals Using Gabor Features
Script identification is one of the challenging steps in the development of optical character recognition system for bilingual or multilingual documents. In this paper an attempt is made for identification of English numerals at word level from Punjabi documents by using Gabor features. The support vector machine (SVM) classifier with five fold cross validation is used to classify the word images. The results obtained are quite encouraging. Average accuracy with RBF kernel, Polynomial and Linear Kernel functions comes out to be greater than 99%.
Combined Feature Based Hyperspectral Image Classification Technique Using Support Vector Machines
A spatial classification technique incorporating a State of Art Feature Extraction algorithm is proposed in this paper for classifying a heterogeneous classes present in hyper spectral images. The classification accuracy can be improved if and only if both the feature extraction and classifier selection are proper. As the classes in the hyper spectral images are assumed to have different textures, textural classification is entertained. Run Length feature extraction is entailed along with the Principal Components and Independent Components. A Hyperspectral Image of Indiana Site taken by AVIRIS is inducted for the experiment. Among the original 220 bands, a subset of 120 bands is selected. Gray Level Run Length Matrix (GLRLM) is calculated for the selected forty bands. From GLRLMs the Run Length features for individual pixels are calculated. The Principle Components are calculated for other forty bands. Independent Components are calculated for next forty bands. As Principal & Independent Components have the ability to represent the textural content of pixels, they are treated as features. The summation of Run Length features, Principal Components, and Independent Components forms the Combined Features which are used for classification. SVM with Binary Hierarchical Tree is used to classify the hyper spectral image. Results are validated with ground truth and accuracies are calculated.
Fusion Classifier for Open-Set Face Recognition with Pose Variations
A fusion classifier composed of two modules, one made by a hidden Markov model (HMM) and the other by a support vector machine (SVM), is proposed to recognize faces with pose variations in open-set recognition settings. The HMM module captures the evolution of facial features across a subject-s face using the subject-s facial images only, without referencing to the faces of others. Because of the captured evolutionary process of facial features, the HMM module retains certain robustness against pose variations, yielding low false rejection rates (FRR) for recognizing faces across poses. This is, however, on the price of poor false acceptance rates (FAR) when recognizing other faces because it is built upon withinclass samples only. The SVM module in the proposed model is developed following a special design able to substantially diminish the FAR and further lower down the FRR. The proposed fusion classifier has been evaluated in performance using the CMU PIE database, and proven effective for open-set face recognition with pose variations. Experiments have also shown that it outperforms the face classifier made by HMM or SVM alone.
A Bayesian Kernel for the Prediction of Protein- Protein Interactions
Understanding proteins functions is a major goal in
the post-genomic era. Proteins usually work in context of other
proteins and rarely function alone. Therefore, it is highly relevant to
study the interaction partners of a protein in order to understand its
function. Machine learning techniques have been widely applied to
predict protein-protein interactions. Kernel functions play an
important role for a successful machine learning technique. Choosing
the appropriate kernel function can lead to a better accuracy in a
binary classifier such as the support vector machines. In this paper,
we describe a Bayesian kernel for the support vector machine to
predict protein-protein interactions. The use of Bayesian kernel can
improve the classifier performance by incorporating the probability
characteristic of the available experimental protein-protein
interactions data that were compiled from different sources. In
addition, the probabilistic output from the Bayesian kernel can assist
biologists to conduct more research on the highly predicted
interactions. The results show that the accuracy of the classifier has
been improved using the Bayesian kernel compared to the standard
SVM kernels. These results imply that protein-protein interaction can
be predicted using Bayesian kernel with better accuracy compared to
the standard SVM kernels.
Support Vector Machines For Understanding Lane Color and Sidewalks
Understanding road features such as lanes, the color
of lanes, and sidewalks in a live video captured from a moving
vehicle is essential to build video-based navigation systems. In this
paper, we present a novel idea to understand the road features using
support vector machines. Various feature vectors including color
components of road markings and the difference between two
regions, i.e., chosen AOIs, and so on are fed into SVM, deciding
colors of lanes and sidewalks robustly. Experimental results are
provided to show the robustness of the proposed idea.
Glass Bottle Inspector Based on Machine Vision
This text studies glass bottle intelligent inspector
based machine vision instead of manual inspection. The system
structure is illustrated in detail in this paper. The text presents the
method based on watershed transform methods to segment the
possible defective regions and extract features of bottle wall by rules.
Then wavelet transform are used to exact features of bottle finish
from images. After extracting features, the fuzzy support vector
machine ensemble is putted forward as classifier. For ensuring that
the fuzzy support vector machines have good classification ability,
the GA based ensemble method is used to combining the several
fuzzy support vector machines. The experiments demonstrate that
using this inspector to inspect glass bottles, the accuracy rate may
reach above 97.5%.
Artificial Neural Networks and Multi-Class Support Vector Machines for Classifying Magnetic Measurements in Tokamak Reactors
This paper is mainly concerned with the application of
a novel technique of data interpretation for classifying measurements
of plasma columns in Tokamak reactors for nuclear fusion
applications. The proposed method exploits several concepts derived
from soft computing theory. In particular, Artificial Neural Networks
and Multi-Class Support Vector Machines have been exploited to
classify magnetic variables useful to determine shape and position of
the plasma with a reduced computational complexity. The proposed
technique is used to analyze simulated databases of plasma equilibria
based on ITER geometry configuration. As well as demonstrating the
successful recovery of scalar equilibrium parameters, we show that
the technique can yield practical advantages compared with earlier
Feature Subset Selection approach based on Maximizing Margin of Support Vector Classifier
Identification of cancer genes that might anticipate
the clinical behaviors from different types of cancer disease is
challenging due to the huge number of genes and small number of
patients samples. The new method is being proposed based on
supervised learning of classification like support vector machines
(SVMs).A new solution is described by the introduction of the
Maximized Margin (MM) in the subset criterion, which permits to
get near the least generalization error rate. In class prediction
problem, gene selection is essential to improve the accuracy and to
identify genes for cancer disease. The performance of the new
method was evaluated with real-world data experiment. It can give
the better accuracy for classification.
Using Support Vector Machine for Prediction Dynamic Voltage Collapse in an Actual Power System
This paper presents dynamic voltage collapse prediction on an actual power system using support vector machines.
Dynamic voltage collapse prediction is first determined based on the PTSI calculated from information in dynamic simulation output. Simulations were carried out on a practical 87 bus test system by considering load increase as the contingency. The data collected from the time domain simulation is then used as input to the SVM in which support vector regression is used as a predictor to determine the
dynamic voltage collapse indices of the power system. To reduce training time and improve accuracy of the SVM, the Kernel function type and Kernel parameter are considered. To verify the
effectiveness of the proposed SVM method, its performance is compared with the multi layer perceptron neural network (MLPNN). Studies show that the SVM gives faster and more accurate results for dynamic voltage collapse prediction compared with the MLPNN.
Shift Invariant Support Vector Machines Face Recognition System
In this paper, we present a new method for
incorporating global shift invariance in support vector machines.
Unlike other approaches which incorporate a feature extraction stage,
we first scale the image and then classify it by using the modified
support vector machines classifier. Shift invariance is achieved by
replacing dot products between patterns used by the SVM classifier
with the maximum cross-correlation value between them. Unlike the
normal approach, in which the patterns are treated as vectors, in our
approach the patterns are treated as matrices (or images). Crosscorrelation
is computed by using computationally efficient
techniques such as the fast Fourier transform. The method has been
tested on the ORL face database. The tests indicate that this method
can improve the recognition rate of an SVM classifier.
Ensembling Classifiers – An Application toImage Data Classification from Cherenkov Telescope Experiment
Ensemble learning algorithms such as AdaBoost and
Bagging have been in active research and shown improvements in
classification results for several benchmarking data sets with mainly
decision trees as their base classifiers. In this paper we experiment to
apply these Meta learning techniques with classifiers such as random
forests, neural networks and support vector machines. The data sets
are from MAGIC, a Cherenkov telescope experiment. The task is to
classify gamma signals from overwhelmingly hadron and muon
signals representing a rare class classification problem. We compare
the individual classifiers with their ensemble counterparts and
discuss the results. WEKA a wonderful tool for machine learning has
been used for making the experiments.
Modeling of Reinforcement in Concrete Beams Using Machine Learning Tools
The paper discusses the results obtained to predict
reinforcement in singly reinforced beam using Neural Net (NN),
Support Vector Machines (SVM-s) and Tree Based Models. Major
advantage of SVM-s over NN is of minimizing a bound on the
generalization error of model rather than minimizing a bound on
mean square error over the data set as done in NN. Tree Based
approach divides the problem into a small number of sub problems to
reach at a conclusion. Number of data was created for different
parameters of beam to calculate the reinforcement using limit state
method for creation of models and validation. The results from this
study suggest a remarkably good performance of tree based and
SVM-s models. Further, this study found that these two techniques
work well and even better than Neural Network methods. A
comparison of predicted values with actual values suggests a very
good correlation coefficient with all four techniques.
A Hybrid GMM/SVM System for Text Independent Speaker Identification
This paper proposes a novel approach that combines statistical models and support vector machines. A hybrid scheme which appropriately incorporates the advantages of both the generative and discriminant model paradigms is described and evaluated. Support vector machines (SVMs) are trained to divide the whole speakers' space into small subsets of speakers within a hierarchical tree structure. During testing a speech token is assigned to its corresponding group and evaluation using gaussian mixture models (GMMs) is then processed. Experimental results show that the proposed method can significantly improve the performance of text independent speaker identification task. We report improvements of up to 50% reduction in identification error rate compared to the baseline statistical model.
An Exact Solution to Support Vector Mixture
This paper presents a new version of the SVM mixture algorithm initially proposed by Kwok for classification and regression problems. For both cases, a slight modification of the mixture model leads to a standard SVM training problem, to the existence of an exact solution and allows the direct use of well known decomposition and working set selection algorithms. Only the regression case is considered in this paper but classification has been addressed in a very similar way. This method has been successfully applied to engine pollutants emission modeling.
Resolving Dependency Ambiguity of Subordinate Clauses using Support Vector Machines
In this paper, we propose a method of resolving dependency ambiguities of Korean subordinate clauses based on Support Vector Machines (SVMs). Dependency analysis of clauses is well known to be one of the most difficult tasks in parsing sentences, especially in Korean. In order to solve this problem, we assume that the dependency relation of Korean subordinate clauses is the dependency relation among verb phrase, verb and endings in the clauses. As a result, this problem is represented as a binary classification task. In order to apply SVMs to this problem, we selected two kinds of features: static and dynamic features. The experimental results on STEP2000 corpus show that our system achieves the accuracy of 73.5%.