A key issue in stock investment is how to select representative features for stock selection. The objective of this paper is to firstly determine whether an automated stock investment system, using machine learning techniques, may be used to identify a portfolio of growth stocks that are highly likely to provide returns better than the stock market index. The second objective is to identify the technical features that best characterize whether a stock’s price is likely to go up and to identify the most important factors and their contribution to predicting the likelihood of the stock price going up. Unsupervised machine learning techniques, such as cluster analysis, were applied to the stock data to identify a cluster of stocks that was likely to go up in price – portfolio 1. Next, the principal component analysis technique was used to select stocks that were rated high on component one and component two – portfolio 2. Thirdly, a supervised machine learning technique, the logistic regression method, was used to select stocks with a high probability of their price going up – portfolio 3. The predictive models were validated with metrics such as, sensitivity (recall), specificity and overall accuracy for all models. All accuracy measures were above 70%. All portfolios outperformed the market by more than eight times. The top three stocks were selected for each of the three stock portfolios and traded in the market for one month. After one month the return for each stock portfolio was computed and compared with the stock market index returns. The returns for all three stock portfolios was 23.87% for the principal component analysis stock portfolio, 11.65% for the logistic regression portfolio and 8.88% for the K-means cluster portfolio while the stock market performance was 0.38%. This study confirms that an automated stock investment system using machine learning techniques can identify top performing stock portfolios that outperform the stock market.
This paper presents a Machine Learning (ML) approach to support Meningitis diagnosis in patients at a children’s hospital in Sao Paulo, Brazil. The aim is to use ML techniques to reduce the use of invasive procedures, such as cerebrospinal fluid (CSF) collection, as much as possible. In this study, we focus on predicting the probability of Meningitis given the results of a blood and urine laboratory tests, together with the analysis of pain or other complaints from the patient. We tested a number of different ML algorithms, including: Adaptative Boosting (AdaBoost), Decision Tree, Gradient Boosting, K-Nearest Neighbors (KNN), Logistic Regression, Random Forest and Support Vector Machines (SVM). Decision Tree algorithm performed best, with 94.56% and 96.18% accuracy for training and testing data, respectively. These results represent a significant aid to doctors in diagnosing Meningitis as early as possible and in preventing expensive and painful procedures on some children.
In Chile, there is a lack of evidence about the impact of polyvictimization on the emergence of suicidal thoughts among children and young people. Thus, this study aims to explore the association between the episodes of polyvictimization suffered by Chilean children and young people and the manifestation of signs related to suicidal tendencies. To achieve this purpose, secondary data from the First Polyvictimization Survey on Children and Adolescents of 2017 were analyzed, and a binomial logistic regression model was applied to establish the probability that young people are experiencing suicidal ideation episodes. The main findings show that women between the ages of 13 and 15 years, who are in seventh grade and second in subsidized schools, are more likely to express suicidal ideas, which increases if they have suffered different types of victimization, particularly physical violence, psychological aggression, and sexual abuse.
The logistic regression (LR) and multivariate adaptive regression spline (MarSpline) are applied and verified for analysis of landslide susceptibility map in Oudka, Morocco, using geographical information system. From spatial database containing data such as landslide mapping, topography, soil, hydrology and lithology, the eight factors related to landslides such as elevation, slope, aspect, distance to streams, distance to road, distance to faults, lithology map and Normalized Difference Vegetation Index (NDVI) were calculated or extracted. Using these factors, landslide susceptibility indexes were calculated by the two mentioned methods. Before the calculation, this database was divided into two parts, the first for the formation of the model and the second for the validation. The results of the landslide susceptibility analysis were verified using success and prediction rates to evaluate the quality of these probabilistic models. The result of this verification was that the MarSpline model is the best model with a success rate (AUC = 0.963) and a prediction rate (AUC = 0.951) higher than the LR model (success rate AUC = 0.918, rate prediction AUC = 0.901).
Hypertension is one of the important reasons of morbidity and mortality in countries, including Iran. It has been shown that hypertension is a consequence of the interaction of genetics and environment. Nutrients have important roles in the controlling of blood pressure. We assessed dietary habit and anthropometric status in patients with hypertension in the north of Iran, and that have special dietary habit and according to their culture. This study was conducted on 127 patients with newly recognized hypertension and the 120 normotensive participants. Anthropometric status was measured and demographic characteristics, and medical condition were collected by valid questionnaires and dietary habit assessment was assessed with 3-day food recall (two weekdays and one weekend). The mean age of participants was 58 ± 6.7 years. The mean level of energy intake, saturated fat, vitamin D, potassium, zinc, dietary fiber, vitamin C, calcium, phosphorus, copper and magnesium was significantly lower in the hypertensive group compared to the control (p < 0.05). After adjusting for energy intake, positive association was observe between hypertension and some dietary nutrients including; Cholesterol [OR: 1.1, P: 0.001, B: 0.06], fiber [OR: 1.6, P: 0.001, B: 1.8], vitamin D [OR: 2.6, P: 0.006, B: 0.9] and zinc [OR: 1.4, P: 0.006, B: 0.3] intake. Logistic regression analysis showed that there was not significant association between hypertension, weight and waist circumference. In our study, the mean intake of some nutrients was lower in the hypertensive individuals compared to the normotensive individual. Health training about suitable dietary habits and easier access to vitamin D supplementation in patients with hypertension are cost-effective tools to improve outcomes in Iran.
The objectives of the research are to find the basic engineering properties of lateritic soil and to predict the impact on community members who live nearby the excavation pits in the area of Amphur Pak Thor, Ratchaburi Province in the western area of Thailand. The research was conducted by collecting soil samples from four excavation pits for basic engineering properties, testing and collecting questionnaire data from 120 community members who live nearby the excavation pits, and applying statistical analysis. The results found that the basic engineering properties of lateritic soil can be classified into silt soil type which is cohesionless as the loess or collapsible soil which is not suitable to be used for a pavement structure for commuting highway because it could lead to structural and functional failure in the long run. In terms of opinion from community members toward the impact, the highest impact was on the dust from excavation activities. The prediction from the logistic regression in terms of impact on community members was at 84.32 which can be adapted and applied onto other areas with the same context as a guideline for risk prevention and risk communication since it could impact the infrastructures and also impact the health of community members.
Neighbourhood environment walkability on reported physical activity (PA) levels of students of Universiti Sains Malaysia (USM) in Malaysia. Compared with previous generations, today’s young people spend less time playing outdoors and have lower participation rates in PA. Research suggests that negative perceptions of neighbourhood walkability may be a potential barrier to adolescents’ PA. The sample consisted of 200 USM students (to 24 years old) who live outside of the main campus and engage in PA in sport halls and sport fields of USM. The data were analysed using the t-test, binary logistic regression, and discriminant analysis techniques. The present study found that youth PA was affected by neighbourhood environment walkability factors, including neighbourhood infrastructures, neighbourhood safety (crime), and recreation facilities, as well as street characteristics and neighbourhood design variables such as facades of sidewalks, roadside trees, green spaces, and aesthetics. The finding also illustrated that active students were influenced by street connectivity, neighbourhood infrastructures, recreation facilities, facades of sidewalks, and aesthetics, whereas students in the less active group were affected by access to destinations, neighbourhood safety (crime), and roadside trees and green spaces for their PAs. These results report which factors of built environments have more effect on youth PA and they message to the public to create more awareness about the benefits of PA on youth health.
Cervical dentinal hypersensitivity (CDH) affects 8-30% of adults and nearly 85% of perio-treated patients. Various treatment schemes have been applied for treating CDH, among them being fluoride application, laser irradiation, and, recently, bioglass. The purpose of this study was to investigate the influence of bioglass, copper-bromide (Cu-Br) laser irradiation and their combination on dentinal tubule occlusion as a potential dentinal hypersensitivity treatment for CDH. 45 human dentin surfaces were organized into three equal groups: group A received Cu-Br laser only; group B received bioglass only; group C received bioglass followed by Cu-Br laser irradiation. Specimens were evaluated with regard to dentinal tubule occlusion under environmental scanning electron microscope. Treatment modality significantly affected dentinal tubule occlusion (p<0.001). Groups B and C scored higher dentinal tubule occlusion than group A. Binary logistic regression showed that bioglass application significantly (p<0.001) contributed to dentinal tubule occlusion, compared with other variables. Under the conditions used herein and within the limitations of this study, bioglass application, alone or combined with Cu-Br laser irradiation, is a superior method for producing dentinal tubule occlusion, and may lead to an effective treatment modality for CDH.
The under-5 mortality rate is high in sub-Saharan Africa with Lesotho being amongst the highest under-5 mortality rates in the world. The objective of the study is to determine the factors associated with under-5 mortality in Lesotho. The data used for this analysis come from the nationally representative household survey called the 2009 Lesotho Demographic and Health Survey. Odds ratios produced by the logistic regression models were used to measure the effect of each independent variable on the dependent variable. Female children were significantly 38% less likely to die than male children. Children who were breastfed for 13 to 18 months and those who were breastfed for more than 19 months were significantly less likely to die than those who were breastfed for 12 months or less. Furthermore, children of mothers who stayed in Quthing, Qacha’s Nek and Thaba Tseka ran the greatest risk of dying. The results suggested that: sex of child, type of birth, breastfeeding duration, district, source of energy and marital status were significant predictors of under-5 mortality, after correcting for all variables.
Myocardial infarction is one of the leading causes of death in the world. Some of these deaths occur even before the patient reaches the hospital. Myocardial infarction occurs as a result of impaired blood supply. Because the most of these deaths are due to coronary artery disease, hence the awareness of the warning signs of a heart attack is essential. Some heart attacks are sudden and intense, but most of them start slowly, with mild pain or discomfort, then early detection and successful treatment of these symptoms is vital to save them. Therefore, importance and usefulness of a system designing to assist physicians in early diagnosis of the acute heart attacks is obvious. The main purpose of this study would be to enable patients to become better informed about their condition and to encourage them to seek professional care at an earlier stage in the appropriate situations. For this purpose, the data were collected on 711 heart patients in Iran hospitals. 28 attributes of clinical factors can be reported by patients; were studied. Three logistic regression models were made on the basis of the 28 features to predict the risk of heart attacks. The best logistic regression model in terms of performance had a C-index of 0.955 and with an accuracy of 94.9%. The variables, severe chest pain, back pain, cold sweats, shortness of breath, nausea and vomiting, were selected as the main features.
This study analyzes the innovative orientation of the Croatian entrepreneurs. Innovative orientation is represented by the perceived extent to which an entrepreneur’s product or service or technology is new, and no other businesses offer the same product. The sample is extracted from the GEM Croatia Adult Population Survey dataset for the years 2003-2013. We apply descriptive statistics, t-test, Chi-square test and logistic regression. Findings indicate that innovative orientations vary with personal, firm, meso and macro level variables, and between different stages in entrepreneurship process. Significant predictors are occupation of the entrepreneurs, size of the firm and export aspiration for both early stage and established entrepreneurs. In addition, fear of failure, expecting to start a new business and seeing an entrepreneurial career as a desirable choice are predictors of innovative orientation among early stage entrepreneurs.
Estimation of a proportion has many applications in economics and social studies. A common application is the estimation of the low income proportion, which gives the proportion of people classified as poor into a population. In this paper, we present this poverty indicator and propose to use the logistic regression estimator for the problem of estimating the low income proportion. Various sampling designs are presented. Assuming a real data set obtained from the European Survey on Income and Living Conditions, Monte Carlo simulation studies are carried out to analyze the empirical performance of the logistic regression estimator under the various sampling designs considered in this paper. Results derived from Monte Carlo simulation studies indicate that the logistic regression estimator can be more accurate than the customary estimator under the various sampling designs considered in this paper. The stratified sampling design can also provide more accurate results.
Cutting tools are widely used in manufacturing processes and drilling is the most commonly used machining process. Although drill-bits used in drilling may not be expensive, their breakage can cause damage to expensive work piece being drilled and at the same time has major impact on productivity. Predicting drill-bit breakage, therefore, is important in reducing cost and improving productivity. This study uses twenty features extracted from two degradation signals viz., thrust force and torque. The methodology used involves developing and comparing decision tree, random forest, and multinomial logistic regression models for classifying and predicting drill-bit breakage using degradation signals.
The problem of estimating a proportion has important applications in the field of economics, and in general, in many areas such as social sciences. A common application in economics is the estimation of the headcount index. In this paper, we define the general headcount index as a proportion. Furthermore, we introduce a new quantitative method for estimating the headcount index. In particular, we suggest to use the logistic regression estimator for the problem of estimating the headcount index. Assuming a real data set, results derived from Monte Carlo simulation studies indicate that the logistic regression estimator can be more accurate than the traditional estimator of the headcount index.
International market expansion involves a strategic process of market entry decision through which a firm expands its operation from domestic to the international domain. Hence, entry timing choices require the needs to balance the early entry risks and the problems in losing opportunities as a result of late entry into a new market. Questionnaire surveys administered to 115 Malaysian construction firms operating in 51 countries worldwide have resulted in 39.1 percent response rate. Factor analysis was used to determine the most significant factors affecting entry timing choices of the firms to penetrate the international market. A logistic regression analysis used to examine the firms’ entry timing choices, indicates that the model has correctly classified 89.5 per cent of cases as late movers. The findings reveal that the most significant factor influencing the construction firms’ choices as late movers was the firm factor related to the firm’s international experience, resources, competencies and financing capacity. The study also offers valuable information to construction firms with intention to internationalize their businesses.
Precipitation forecast is important in avoid incident of natural disaster which can cause loss in involved area. This review paper involves three techniques from artificial intelligence namely logistic regression, decisions tree, and random forest which used in making precipitation forecast. These combination techniques through VAR model in finding advantages and strength for every technique in forecast process. Data contains variables from rain domain. Adaptation of artificial intelligence techniques involved on rain domain enables the process to be easier and systematic for precipitation forecast.