18

10010760

A Recognition Method of Ancient Yi Script Based on Deep Learning

Yi is an ethnic group mainly living in mainland China, with its own spoken and written language systems, after development of thousands of years. Ancient Yi is one of the six ancient languages in the world, which keeps a record of the history of the Yi people and offers documents valuable for research into human civilization. Recognition of the characters in ancient Yi helps to transform the documents into an electronic form, making their storage and spreading convenient. Due to historical and regional limitations, research on recognition of ancient characters is still inadequate. Thus, deep learning technology was applied to the recognition of such characters. Five models were developed on the basis of the four-layer convolutional neural network (CNN). Alpha-Beta divergence was taken as a penalty term to re-encode output neurons of the five models. Two fully connected layers fulfilled the compression of the features. Finally, at the softmax layer, the orthographic features of ancient Yi characters were re-evaluated, their probability distributions were obtained, and characters with features of the highest probability were recognized. Tests conducted show that the method has achieved higher precision compared with the traditional CNN model for handwriting recognition of the ancient Yi.

17

10010045

Determination of the Best Fit Probability Distribution for Annual Rainfall in Karkheh River at Iran

This study was designed to find the best-fit probability distribution of annual rainfall based on 50 years sample (1966-2015) in the Karkheh river basin at Iran using six probability distributions: Normal, 2-Parameter Log Normal, 3-Parameter Log Normal, Pearson Type 3, Log Pearson Type 3 and Gumbel distribution. The best fit probability distribution was selected using Stormwater Management and Design Aid (SMADA) software and based on the Residual Sum of Squares (R.S.S) between observed and estimated values Based on the R.S.S values of fit tests, the Log Pearson Type 3 and then Pearson Type 3 distributions were found to be the best-fit probability distribution at the Jelogir Majin and Pole Zal rainfall gauging station. The annual values of expected rainfall were calculated using the best fit probability distributions and can be used by hydrologists and design engineers in future research at studied region and other region in the world.

16

10009651

Evaluation of Best-Fit Probability Distribution for Prediction of Extreme Hydrologic Phenomena

The probability distributions are the best method for forecasting of extreme hydrologic phenomena such as rainfall and flood flows. In this research, in order to determine suitable probability distribution for estimating of annual extreme rainfall and flood flows (discharge) series with different return periods, precipitation with 40 and discharge with 58 years time period had been collected from Karkheh River at Iran. After homogeneity and adequacy tests, data have been analyzed by Stormwater Management and Design Aid (SMADA) software and residual sum of squares (R.S.S). The best probability distribution was Log Pearson Type III with R.S.S value (145.91) and value (13.67) for peak discharge and Log Pearson Type III with R.S.S values (141.08) and (8.95) for maximum discharge in Jelogir Majin and Pole Zal stations, respectively. The best distribution for maximum precipitation in Jelogir Majin and Pole Zal stations was Log Pearson Type III distribution with R.S.S values (1.74&1.90) and then Pearson Type III distribution with R.S.S values (1.53&1.69). Overall, the Log Pearson Type III distributions are acceptable distribution types for representing statistics of extreme hydrologic phenomena in Karkheh River at Iran with the Pearson Type III distribution as a potential alternative.

15

10009311

Distances over Incomplete Diabetes and Breast Cancer Data Based on Bhattacharyya Distance

Missing values in real-world datasets are a common
problem. Many algorithms were developed to deal with this
problem, most of them replace the missing values with a fixed
value that was computed based on the observed values. In
our work, we used a distance function based on Bhattacharyya
distance to measure the distance between objects with missing
values. Bhattacharyya distance, which measures the similarity of
two probability distributions. The proposed distance distinguishes
between known and unknown values. Where the distance between
two known values is the Mahalanobis distance. When, on the other
hand, one of them is missing the distance is computed based on the
distribution of the known values, for the coordinate that contains
the missing value. This method was integrated with Wikaya, a
digital health company developing a platform that helps to improve
prevention of chronic diseases such as diabetes and cancer. In order
for Wikaya’s recommendation system to work distance between users
need to be measured. Since there are missing values in the collected
data, there is a need to develop a distance function distances between
incomplete users profiles. To evaluate the accuracy of the proposed
distance function in reflecting the actual similarity between different
objects, when some of them contain missing values, we integrated it
within the framework of k nearest neighbors (kNN) classifier, since
its computation is based only on the similarity between objects. To
validate this, we ran the algorithm over diabetes and breast cancer
datasets, standard benchmark datasets from the UCI repository. Our
experiments show that kNN classifier using our proposed distance
function outperforms the kNN using other existing methods.

14

10001269

Data-driven Multiscale Tsallis Complexity: Application to EEG Analysis

This work proposes a data-driven multiscale based quantitative measures to reveal the underlying complexity of electroencephalogram (EEG), applying to a rodent model of hypoxic-ischemic brain injury and recovery. Motivated by that real EEG recording is nonlinear and non-stationary over different frequencies or scales, there is a need of more suitable approach over the conventional single scale based tools for analyzing the EEG data. Here, we present a new framework of complexity measures considering changing dynamics over multiple oscillatory scales. The proposed multiscale complexity is obtained by calculating entropies of the probability distributions of the intrinsic mode functions extracted by the empirical mode decomposition (EMD) of EEG. To quantify EEG recording of a rat model of hypoxic-ischemic brain injury following cardiac arrest, the multiscale version of Tsallis entropy is examined. To validate the proposed complexity measure, actual EEG recordings from rats (n=9) experiencing 7 min cardiac arrest followed by resuscitation were analyzed. Experimental results demonstrate that the use of the multiscale Tsallis entropy leads to better discrimination of the injury levels and improved correlations with the neurological deficit evaluation after 72 hours after cardiac arrest, thus suggesting an effective metric as a prognostic tool.

13

10000260

A Simplified Distribution for Nonlinear Seas

The exact theoretical expression describing the probability distribution of nonlinear sea-surface elevations derived from the second-order narrowband model has a cumbersome form that requires numerical computations, not well-disposed to theoretical or practical applications. Here, the same narrowband model is reexamined to develop a simpler closed-form approximation suitable for theoretical and practical applications. The salient features of the approximate form are explored, and its relative validity is verified with comparisons to other readily available approximations, and oceanic data.

12

9999919

Identification of Outliers in Flood Frequency Analysis: Comparison of Original and Multiple Grubbs-Beck Test

At-site flood frequency analysis is used to estimate flood quantiles when at-site record length is reasonably long. In Australia, FLIKE software has been introduced for at-site flood frequency analysis. The advantage of FLIKE is that, for a given application, the user can compare a number of most commonly adopted probability distributions and parameter estimation methods relatively quickly using a windows interface. The new version of FLIKE has been incorporated with the multiple Grubbs and Beck test which can identify multiple numbers of potentially influential low flows. This paper presents a case study considering six catchments in eastern Australia which compares two outlier identification tests (original Grubbs and Beck test and multiple Grubbs and Beck test) and two commonly applied probability distributions (Generalized Extreme Value (GEV) and Log Pearson type 3 (LP3)) using FLIKE software. It has been found that the multiple Grubbs and Beck test when used with LP3 distribution provides more accurate flood quantile estimates than when LP3 distribution is used with the original Grubbs and Beck test. Between these two methods, the differences in flood quantile estimates have been found to be up to 61% for the six study catchments. It has also been found that GEV distribution (with L moments) and LP3 distribution with the multiple Grubbs and Beck test provide quite similar results in most of the cases; however, a difference up to 38% has been noted for flood quantiles for annual exceedance probability (AEP) of 1 in 100 for one catchment. This finding needs to be confirmed with a greater number of stations across other Australian states.

11

9998542

Determining the Best Fitting Distributions for Minimum Flows of Streams in Gediz Basin

Today, the need for water sources is swiftly increasing due to population growth. At the same time, it is known that some regions will face with shortage of water and drought because of the global warming and climate change. In this context, evaluation and analysis of hydrological data such as the observed trends, drought and flood prediction of short term flow has great deal of importance. The most accurate selection probability distribution is important to describe the low flow statistics for the studies related to drought analysis. As in many basins In Turkey, Gediz River basin will be affected enough by the drought and will decrease the amount of used water. The aim of this study is to derive appropriate probability distributions for frequency analysis of annual minimum flows at 6 gauging stations of the Gediz Basin. After applying 10 different probability distributions, six different parameter estimation methods and 3 fitness test, the Pearson 3 distribution and general extreme values distributions were found to give optimal results.

10

9997345

Entropic Measures of a Probability Sample Space and Exponential Type (α, β) Entropy

Entropy is a key measure in studies related to information theory and its many applications. Campbell for the first time recognized that the exponential of the Shannon’s entropy is just the size of the sample space, when distribution is uniform. Here is the idea to study exponentials of Shannon’s and those other entropy generalizations that involve logarithmic function for a probability distribution in general. In this paper, we introduce a measure of sample space, called ‘entropic measure of a sample space’, with respect to the underlying distribution. It is shown in both discrete and continuous cases that this new measure depends on the parameters of the distribution on the sample space - same sample space having different ‘entropic measures’ depending on the distributions defined on it. It was noted that Campbell’s idea applied for R`enyi’s parametric entropy of a given order also. Knowing that parameters play a role in providing suitable choices and extended applications, paper studies parametric entropic measures of sample spaces also. Exponential entropies related to Shannon’s and those generalizations that have logarithmic functions, i.e. are additive have been studies for wider understanding and applications. We propose and study exponential entropies corresponding to non additive entropies of type (α, β), which include Havard and Charvˆat entropy as a special case.

9

17108

A Distance Function for Data with Missing Values and Its Application

Missing values in data are common in real world applications. Since the performance of many data mining algorithms depend critically on it being given a good metric over the input space, we decided in this paper to define a distance function for unlabeled datasets with missing values. We use the Bhattacharyya distance, which measures the similarity of two probability distributions, to define our new distance function. According to this distance, the distance between two points without missing attributes values is simply the Mahalanobis distance. When on the other hand there is a missing value of one of the coordinates, the distance is computed according to the distribution of the missing coordinate. Our distance is general and can be used as part of any algorithm that computes the distance between data points. Because its performance depends strongly on the chosen distance measure, we opted for the k nearest neighbor classifier to evaluate its ability to accurately reflect object similarity. We experimented on standard numerical datasets from the UCI repository from different fields. On these datasets we simulated missing values and compared the performance of the kNN classifier using our distance to other three basic methods. Our experiments show that kNN using our distance function outperforms the kNN using other methods. Moreover, the runtime performance of our method is only slightly higher than the other methods.

8

10514

Developing Forecasting Tool for Humanitarian Relief Organizations in Emergency Logistics Planning

Despite the availability of natural disaster related time series data for last 110 years, there is no forecasting tool available to humanitarian relief organizations to determine forecasts for emergency logistics planning. This study develops a forecasting tool based on identifying probability distributions. The estimates of the parameters are used to calculate natural disaster forecasts. Further, the determination of aggregate forecasts leads to efficient pre-disaster planning. Based on the research findings, the relief agencies can optimize the various resources allocation in emergency logistics planning.

7

13422

Time-Domain Stator Current Condition Monitoring: Analyzing Point Failures Detection by Kolmogorov-Smirnov (K-S) Test

This paper deals with condition monitoring of electric switch machine for railway points. Point machine, as a complex electro-mechanical device, switch the track between two alternative routes. There has been an increasing interest in railway safety and the optimal management of railway equipments maintenance, e.g. point machine, in order to enhance railway service quality and reduce system failure. This paper explores the development of Kolmogorov- Smirnov (K-S) test to detect some point failures (external to the machine, slide chairs, fixing, stretchers, etc), while the point machine (inside the machine) is in its proper condition. Time-domain stator Current signatures of normal (healthy) and faulty points are taken by 3 Hall Effect sensors and are analyzed by K-S test. The test is simulated by creating three types of such failures, namely putting a hard stone and a soft stone between stock rail and switch blades as obstacles and also slide chairs- friction. The test has been applied for those three faults which the results show that K-S test can effectively be developed for the aim of other point failures detection, which their current signatures deviate parametrically from the healthy current signature. K-S test as an analysis technique, assuming that any defect has a specific probability distribution. Empirical cumulative distribution functions (ECDF) are used to differentiate these probability distributions. This test works based on the null hypothesis that ECDF of target distribution is statistically similar to ECDF of reference distribution. Therefore by comparing a given current signature (as target signal) from unknown switch state to a number of template signatures (as reference signal) from known switch states, it is possible to identify which is the most likely state of the point machine under analysis.

6

11636

Probability Distribution of Rainfall Depth at Hourly Time-Scale

Rainfall data at fine resolution and knowledge of its
characteristics plays a major role in the efficient design and operation
of agricultural, telecommunication, runoff and erosion control as well
as water quality control systems. The paper is aimed to study the
statistical distribution of hourly rainfall depth for 12 representative
stations spread across Peninsular Malaysia. Hourly rainfall data of 10
to 22 years period were collected and its statistical characteristics
were estimated. Three probability distributions namely, Generalized
Pareto, Exponential and Gamma distributions were proposed to
model the hourly rainfall depth, and three goodness-of-fit tests,
namely, Kolmogorov-Sminov, Anderson-Darling and Chi-Squared
tests were used to evaluate their fitness. Result indicates that the east
cost of the Peninsular receives higher depth of rainfall as compared
to west coast. However, the rainfall frequency is found to be
irregular. Also result from the goodness-of-fit tests show that all the
three models fit the rainfall data at 1% level of significance.
However, Generalized Pareto fits better than Exponential and
Gamma distributions and is therefore recommended as the best fit.

5

2804

The Possibility-Probability Relationship for Bloodstream Concentrations of Physiologically Active Substances

If a possibility distribution and a probability distribution
are describing values x of one and the same system or process
x(t), can they relate to each other? Though in general the possibility
and probability distributions might be not connected at all, we
can assume that in some particular cases there is an association linked
them.
In the presented paper, we consider distributions of bloodstream
concentrations of physiologically active substances and propose that
the probability to observe a concentration x of a substance X can be
produced from the possibility of the event X = x .
The proposed assumptions and resulted theoretical distributions
are tested against the data obtained from various panel studies of the
bloodstream concentrations of the different physiologically active
substances in patients and healthy adults as well.

4

674

An Extension of the Kratzel Function and Associated Inverse Gaussian Probability Distribution Occurring in Reliability Theory

In view of their importance and usefulness in reliability theory and probability distributions, several generalizations of the inverse Gaussian distribution and the Krtzel function are investigated in recent years. This has motivated the authors to introduce and study a new generalization of the inverse Gaussian distribution and the Krtzel function associated with a product of a Bessel function of the third kind )(zKQ and a Z - Fox-Wright generalized hyper geometric function introduced in this paper. The introduced function turns out to be a unified gamma-type function. Its incomplete forms are also discussed. Several properties of this gamma-type function are obtained. By means of this generalized function, we introduce a generalization of inverse Gaussian distribution, which is useful in reliability analysis, diffusion processes, and radio techniques etc. The inverse Gaussian distribution thus introduced also provides a generalization of the Krtzel function. Some basic statistical functions associated with this probability density function, such as moments, the Mellin transform, the moment generating function, the hazard rate function, and the mean residue life function are also obtained.KeywordsFox-Wright function, Inverse Gaussian distribution, Krtzel function & Bessel function of the third kind.

3

3083

Estimation of Time -Varying Linear Regression with Unknown Time -Volatility via Continuous Generalization of the Akaike Information Criterion

The problem of estimating time-varying regression is
inevitably concerned with the necessity to choose the appropriate
level of model volatility - ranging from the full stationarity of instant
regression models to their absolute independence of each other. In the
stationary case the number of regression coefficients to be estimated
equals that of regressors, whereas the absence of any smoothness
assumptions augments the dimension of the unknown vector by the
factor of the time-series length. The Akaike Information Criterion
is a commonly adopted means of adjusting a model to the given
data set within a succession of nested parametric model classes,
but its crucial restriction is that the classes are rigidly defined by
the growing integer-valued dimension of the unknown vector. To
make the Kullback information maximization principle underlying the
classical AIC applicable to the problem of time-varying regression
estimation, we extend it onto a wider class of data models in which
the dimension of the parameter is fixed, but the freedom of its values
is softly constrained by a family of continuously nested a priori
probability distributions.

2

5664

A Formal Approach for Proof Constructions in Cryptography

In this article we explore the application of a formal
proof system to verification problems in cryptography. Cryptographic
properties concerning correctness or security of some cryptographic
algorithms are of great interest. Beside some basic lemmata, we
explore an implementation of a complex function that is used in
cryptography. More precisely, we describe formal properties of this
implementation that we computer prove. We describe formalized
probability distributions (σ-algebras, probability spaces and conditional
probabilities). These are given in the formal language of the
formal proof system Isabelle/HOL. Moreover, we computer prove
Bayes- Formula. Besides, we describe an application of the presented
formalized probability distributions to cryptography. Furthermore,
this article shows that computer proofs of complex cryptographic
functions are possible by presenting an implementation of the Miller-
Rabin primality test that admits formal verification. Our achievements
are a step towards computer verification of cryptographic primitives.
They describe a basis for computer verification in cryptography.
Computer verification can be applied to further problems in cryptographic
research, if the corresponding basic mathematical knowledge
is available in a database.

1

2218

Computer Verification in Cryptography

In this paper we explore the application of a formal proof system to verification problems in cryptography. Cryptographic properties concerning correctness or security of some cryptographic algorithms are of great interest. Beside some basic lemmata, we explore an implementation of a complex function that is used in cryptography. More precisely, we describe formal properties of this implementation that we computer prove. We describe formalized probability distributions (o--algebras, probability spaces and condi¬tional probabilities). These are given in the formal language of the formal proof system Isabelle/HOL. Moreover, we computer prove Bayes' Formula. Besides we describe an application of the presented formalized probability distributions to cryptography. Furthermore, this paper shows that computer proofs of complex cryptographic functions are possible by presenting an implementation of the Miller- Rabin primality test that admits formal verification. Our achievements are a step towards computer verification of cryptographic primitives. They describe a basis for computer verification in cryptography. Computer verification can be applied to further problems in crypto-graphic research, if the corresponding basic mathematical knowledge is available in a database.