The comparisons of mycobacterial genomes have identified several Mycobacterium tuberculosis-specific genomic regions that are absent in other mycobacteria and are known as regions of differences. Due to M. tuberculosis-specificity, the peptides encoded by these regions could be useful in the specific diagnosis of tuberculosis. To explore this possibility, overlapping synthetic peptides corresponding to 39 proteins predicted to be encoded by genes present in regions of differences were tested for antibody-reactivity with sera from tuberculosis patients and healthy subjects. The results identified four immunodominant peptides corresponding to four different proteins, with three of the peptides showing significantly stronger antibody reactivity and rate of positivity with sera from tuberculosis patients than healthy subjects. The fourth peptide was recognized equally well by the sera of tuberculosis patients as well as healthy subjects. Predication of antibody epitopes by bioinformatics analyses using ABCpred server predicted multiple linear epitopes in each peptide. Furthermore, peptide sequence analysis for sequence identity using BLAST suggested M. tuberculosis-specificity for the three peptides that had preferential reactivity with sera from tuberculosis patients, but the peptide with equal reactivity with sera of TB patients and healthy subjects showed significant identity with sequences present in nob-tuberculous mycobacteria. The three identified M. tuberculosis-specific immunodominant peptides may be useful in the serological diagnosis of tuberculosis.
Production of specific antibody responses against hTSH is a cumbersome process due to the high identity between the hTSH and the other members of the glycoprotein hormone family (FSH, LH and HCG) and the high identity between the human hTSH and host animals for antibody production. Therefore, two polyclonal antibodies were purified against two recombinant proteins. Four possible ELISA tests were designed based on these antibodies. These ELISA tests were checked against hTSH and other glycoprotein hormones, and their sensitivity and specificity were assessed. Bioinformatics tools were used to analyze the immunological properties. After the immunogen region selection from hTSH protein, c terminal of B hTSH was selected and applied. Two recombinant genes, with these cut pieces (first: two repeats of C terminal of B hTSH, second: tetanous toxin+B hTSH C terminal), were designed and sub-cloned into the pET32a expression vector. Standard methods were used for protein expression, purification, and verification. Thereafter, immunizations of the white New Zealand rabbits were performed and the serums of them were used for antibody titration, purification and characterization. Then, four ELISA tests based on two antibodies were employed to assess the hTSH and other glycoprotein hormones. The results of these assessments were compared with standard amounts. The obtained results indicated that the desired antigens were successfully designed, sub-cloned, expressed, confirmed and used for in vivo immunization. The raised antibodies were capable of specific and sensitive hTSH detection, while the cross reactivity with the other members of the glycoprotein hormone family was minimum. Among the four designed tests, the test in which the antibody against first protein was used as capture antibody, and the antibody against second protein was used as detector antibody did not show any hook effect up to 50 miu/l. Both proteins have the ability to induce highly sensitive and specific antibody responses against the hTSH. One of the antibody combinations of these antibodies has the highest sensitivity and specificity in hTSH detection.
Studying DNA (deoxyribonucleic acid) sequence is useful in biological processes and it is applied in the fields such as diagnostic and forensic research. DNA is the hereditary information in human and almost all other organisms. It is passed to their generations. Earlier stage detection of defective DNA sequence may lead to many developments in the field of Bioinformatics. Nowadays various tedious techniques are used to identify defective DNA. The proposed work is to analyze and identify the cancer-causing DNA motif in a given sequence. Initially the human DNA sequence is separated as k-mers using k-mer separation rule. The separated k-mers are clustered using Self Organizing Map (SOM). Using Levenshtein distance measure, cancer associated DNA motif is identified from the k-mer clusters. Experimental results of this work indicate the presence or absence of cancer causing DNA motif. If the cancer associated DNA motif is found in DNA, it is declared as the cancer disease causing DNA sequence. Otherwise the input human DNA is declared as normal sequence. Finally, elapsed time is calculated for finding the presence of cancer causing DNA motif using clustering formation. It is compared with normal process of finding cancer causing DNA motif. Locating cancer associated motif is easier in cluster formation process than the other one. The proposed work will be an initiative aid for finding genetic disease related research.
The issues that limit application interoperability is lack of common vocabulary, common structure, application domain knowledge ontology based semantic technology provides solutions that resolves application interoperability issues. Ontology is broadly used in diverse applications such as artificial intelligence, bioinformatics, biomedical, information integration, etc. Ontology can be used to interpret the knowledge of various domains. To reuse, enrich the available ontologies and reduce the duplication of ontologies of the same domain, there is a strong need to integrate the ontologies of the particular domain. The integrated ontology gives complete knowledge about the domain by sharing this comprehensive domain ontology among the groups. As per the literature survey there is no well-defined methodology to represent knowledge of a whole domain. The current research addresses a systematic methodology for knowledge representation using multiple sub-ontologies at different levels that addresses application interoperability and enables semantic information retrieval. The current method represents complete knowledge of a domain by importing concepts from multiple sub ontologies of same and relative domains that reduces ontology duplication, rework, implementation cost through ontology reusability.
Biotechnology is a discipline which explains the use of living organisms and systems to construct a product, or we can define it as an application or technology developed to use biological systems and organisms processes for a specific use. Particularly, it includes cells and its components use for new technologies and inventions. The tools developed can be further used in diverse fields such as agriculture, industry, research and hospitals etc. The 21st century has seen a drastic development and advancement in biotechnology in India. Significant increase in Government of India’s outlays for biotechnology over the past decade has been observed. A sectoral break up of biotechnology-based companies in India shows that most of the companies are agriculture-based companies having interests ranging from tissue culture to biopesticides. Major attention has been given by the companies in health related activities and in environmental biotechnology. The biopharmaceutical, which comprises of vaccines, diagnostic, and recombinant products is the most reliable and largest segment of the Indian Biotech industry. India has developed its vaccine markets and supplies them to various countries. Then there are the bio-services, which mainly comprise of contract researches and manufacturing services. India has made noticeable developments in the field of bio industries including manufacturing of enzymes, biofuels and biopolymers. Biotechnology is also playing a crucial and significant role in the field of agriculture. Traditional methods have been replaced by new technologies that mainly focus on GM crops, marker assisted technologies and the use of biotechnological tools to improve the quality of fertilizers and soil. It may only be a small contributor but has shown to have huge potential for growth. Bioinformatics is a computational method which helps to store, manage, arrange and design tools to interpret the extensive data gathered through experimental trials, making it important in the design of drugs.
Nowadays, ontology is common in many areas like artificial intelligence, bioinformatics, e-commerce, education and many more. Ontology is one of the focus areas in the field of Information Retrieval. The purpose of an ontology is to describe a conceptual representation of concepts and their relationships within a particular domain. In other words, ontology provides a common vocabulary for anyone who needs to share information in the domain. There are several ontology domains in various fields including engineering and non-engineering knowledge. However, there are only a few available ontology for engineering knowledge. Fuzzy logic as engineering knowledge is still not available as ontology domain. In general, fuzzy logic requires step-by-step guidelines and instructions of lab experiments. In this study, we presented domain ontology for Fuzzy Logic Control (FLC) knowledge. We give Table of Content (ToC) with middle strategy based on the Uschold and King method to develop FLC ontology. The proposed framework is developed using Protégé as the ontology tool. The Protégé’s ontology reasoner, known as the Pellet reasoner is then used to validate the presented framework. The presented framework offers better performance based on consistency and classification parameter index. In general, this ontology can provide a platform to anyone who needs to understand FLC knowledge.
Development of a method to estimate gene functions is an important task in bioinformatics. One of the approaches for the annotation is the identification of the metabolic pathway that genes are involved in. Since gene expression data reflect various intracellular phenomena, those data are considered to be related with genes’ functions. However, it has been difficult to estimate the gene function with high accuracy. It is considered that the low accuracy of the estimation is caused by the difficulty of accurately measuring a gene expression. Even though they are measured under the same condition, the gene expressions will vary usually. In this study, we proposed a feature extraction method focusing on the variability of gene expressions to estimate the genes' metabolic pathway accurately. First, we estimated the distribution of each gene expression from replicate data. Next, we calculated the similarity between all gene pairs by KL divergence, which is a method for calculating the similarity between distributions. Finally, we utilized the similarity vectors as feature vectors and trained the multiclass SVM for identifying the genes' metabolic pathway. To evaluate our developed method, we applied the method to budding yeast and trained the multiclass SVM for identifying the seven metabolic pathways. As a result, the accuracy that calculated by our developed method was higher than the one that calculated from the raw gene expression data. Thus, our developed method combined with KL divergence is useful for identifying the genes' metabolic pathway.
Analyzing DNA microarray data sets is a great challenge, which faces the bioinformaticians due to the complication of using statistical and machine learning techniques. The challenge will be doubled if the microarray data sets contain missing data, which happens regularly because these techniques cannot deal with missing data. One of the most important data analysis process on the microarray data set is feature selection. This process finds the most important genes that affect certain disease. In this paper, we introduce a technique for imputing the missing data in microarray data sets while performing feature selection.
GRF, Growth regulating factor, genes encode a novel class of plant-specific transcription factors. The GRF proteins play a role in the regulation of cell numbers in young and growing tissues and may act as transcription activations in growth and development of plants. Identification of GRF genes and their expression are important in plants to performance of the growth and development of various organs. In this study, to better understanding the structural and functional differences of GRFs family, 45 GRF proteins sequences in A. thaliana, Z. mays, O. sativa, B. napus, B. rapa, H. vulgare and S. bicolor, have been collected and analyzed through bioinformatics data mining. As a result, in secondary structure of GRFs, the number of alpha helices was more than beta sheets and in all of them QLQ domains were completely in the biggest alpha helix. In all GRFs, QLQ and WRC domains were completely protected except in AtGRF9. These proteins have no trans-membrane domain and due to have nuclear localization signals act in nuclear and they are component of unstable proteins in the test tube.
Aim of this work was to study the genetic basis for oil accumulation in olive fruit via tracking DGAT2 (Diacylglycerol acyltransferase type-2) gene in three Egyptian Origen Olive cultivars namely Toffahi, Hamed and Maraki using molecular marker techniques and bioinformatics tools. Results illustrate that, firstly: specific genomic band of Maraki cultivars was identified as DGAT2 (Diacylglycerol acyltransferase type-2) and identical for this gene in Olea europaea with 100% of similarity. Secondly, differential genomic band of Maraki cultivars which produced from RAPD fingerprinting technique reflected predicted distinguished sequence which identified as DGAT2 (Diacylglycerol acyltransferase type-2) in Fragaria vesca subsp. Vesca with 76% of sequential similarity. Third and finally, specific genomic specific band of Hamed cultivars was identified as two fragments, 1- Olea europaea cultivar Koroneiki diacylglycerol acyltransferase type 2 mRNA, complete cds with two matches regions with 99% or 2- Predicted: Fragaria vesca subsp. vesca diacylglycerol O-acyltransferase 2-like (LOC101313050), mRNA with 86 % of similarity.
Tumor classification is a key area of research in the field of bioinformatics. Microarray technology is commonly used in the study of disease diagnosis using gene expression levels. The main drawback of gene expression data is that it contains thousands of genes and a very few samples. Feature selection methods are used to select the informative genes from the microarray. These methods considerably improve the classification accuracy. In the proposed method, Genetic Algorithm (GA) is used for effective feature selection. Informative genes are identified based on the T-Statistics, Signal-to-Noise Ratio (SNR) and F-Test values. The initial candidate solutions of GA are obtained from top-m informative genes. The classification accuracy of k-Nearest Neighbor (kNN) method is used as the fitness function for GA. In this work, kNN and Support Vector Machine (SVM) are used as the classifiers. The experimental results show that the proposed work is suitable for effective feature selection. With the help of the selected genes, GA-kNN method achieves 100% accuracy in 4 datasets and GA-SVM method achieves in 5 out of 10 datasets. The GA with kNN and SVM methods are demonstrated to be an accurate method for microarray based tumor classification.
Genome rearrangement is an important area in computational biology and bioinformatics. The basic problem in genome rearrangements is to compute the edit distance, i.e., the minimum number of operations needed to transform one genome into another. Unfortunately, unsigned genome rearrangement problem is NP-hard. In this study an improved ant colony optimization algorithm to approximate the edit distance is proposed. The main idea is to convert the unsigned permutation to signed permutation and evaluate the ants by using Kaplan algorithm. Two new operations are added to the standard ant colony algorithm: Replacing the worst ants by re-sampling the ants from a new probability distribution and applying the crossover operations on the best ants. The proposed algorithm is tested and compared with the improved breakpoint reversal sort algorithm by using three datasets. The results indicate that the proposed algorithm achieves better accuracy ratio than the previous methods.
MicroRNAs (miRNAs), a class of approximately 22 nucleotide long non coding RNAs which play critical role in different biological processes. The mature microRNA is usually 19–27 nucleotides long and is derived from a bigger precursor that folds into a flawed stem-loop structure. Mature micro RNAs are involved in many cellular processes that encompass development, proliferation, stress response, apoptosis, and fat metabolism by gene regulation. Resent finding reveals that certain viruses encode their own miRNA that processed by cellular RNAi machinery. In recent research indicate that cellular microRNA can target the genetic material of invading viruses. Cellular microRNA can be used in the virus life cycle; either to up regulate or down regulate viral gene expression Computational tools use in miRNA target prediction has been changing drastically in recent years. Many of the methods have been made available on the web and can be used by experimental researcher and scientist without expert knowledge of bioinformatics. With the development and ease of use of genomic technologies and computational tools in the field of microRNA biology has superior tremendously over the previous decade. This review attempts to give an overview over the genome wide approaches that have allow for the discovery of new miRNAs and development of new miRNA target prediction tools and databases.
Microarrays are made it possible to simultaneously monitor the expression profiles of thousands of genes under various experimental conditions. It is used to identify the co-expressed genes in specific cells or tissues that are actively used to make proteins. This method is used to analysis the gene expression, an important task in bioinformatics research. Cluster analysis of gene expression data has proved to be a useful tool for identifying co-expressed genes, biologically relevant groupings of genes and samples. In this work K-Means algorithms has been applied for clustering of Gene Expression Data. Further, rough set based Quick reduct algorithm has been applied for each cluster in order to select the most similar genes having high correlation. Then the ACV measure is used to evaluate the refined clusters and classification is used to evaluate the proposed method. They could identify compact clusters with feature selection method used to genes are selected.
Gamboge disorder (GD) or fruit damage by the yellow sap is a major problem in mangosteen. Mangosteen plants varied in the level of GD, from very low or non GD to low, moderate and high GD. However it was difficult to differentiate between GD and non GD plants because evaluation of the disorder is strongly influenced by environment. In this study we investigated the usefulness of primer designed from bioinformatics related to cell wall strength, termed as MCWS, to predict GD. Plant materials used were 28 mangosteen plants selected based on percentage of GD categorized as high, moderate, low and very low or non GD. The result showed that the specific DNA fragments were absent in the high GD accessions. The MCWS marker suggests as a novel polymorphic marker for GD in mangosteen as well as a marker for detect variability in mangosteen as apomictic plant.
Cancer classification to their corresponding cohorts has been key area of research in bioinformatics aiming better prognosis of the disease. High dimensionality of gene data has been makes it a complex task and requires significance data identification technique in order to reducing the dimensionality and identification of significant information. In this paper, we have proposed a novel approach for classification of oral cancer into metastasis positive and negative patients. We have used significance analysis of microarrays (SAM) for identifying significant genes which constitutes gene signature. 3 different gene signatures were identified using SAM from 3 different combination of training datasets and their classification accuracy was calculated on corresponding testing datasets using k-Nearest Neighbour (kNN), Fuzzy C-Means Clustering (FCM), Support Vector Machine (SVM) and Backpropagation Neural Network (BPNN). A final gene signature of only 9 genes was obtained from above 3 individual gene signatures. 9 gene signature-s classification capability was compared using same classifiers on same testing datasets. Results obtained from experimentation shows that 9 gene signature classified all samples in testing dataset accurately while individual genes could not classify all accurately.
The Far From Most Strings Problem (FFMSP) is to obtain a string which is far from as many as possible of a given set of strings. All the input and the output strings are of the same length, and two strings are said to be far if their hamming distance is greater than or equal to a given positive integer. FFMSP belongs to the class of sequences consensus problems which have applications in molecular biology. The problem is NP-hard; it does not admit a constant-ratio approximation either, unless P = NP. Therefore, in addition to exact and approximate algorithms, (meta)heuristic algorithms have been proposed for the problem in recent years. On the other hand, in the recent years, hybrid algorithms have been proposed and successfully used for many hard problems in a variety of domains. In this paper, a new metaheuristic algorithm, called Constructive Beam and Local Search (CBLS), is investigated for the problem, which is a hybridization of constructive beam search and local search algorithms. More specifically, the proposed algorithm consists of two phases, the first phase is to obtain several candidate solutions via the constructive beam search and the second phase is to apply local search to the candidate solutions obtained by the first phase. The best solution found is returned as the final solution to the problem. The proposed algorithm is also similar to memetic algorithms in the sense that both use local search to further improve individual solutions. The CBLS algorithm is compared with the most recent published algorithm for the problem, GRASP, with significantly positive results; the improvement is by order of magnitudes in most cases.
Discovering new biological knowledge from the highthroughput biological data is a major challenge to bioinformatics today. To address this challenge, we developed a new approach for protein classification. Proteins that are evolutionarily- and thereby functionally- related are said to belong to the same classification. Identifying protein classification is of fundamental importance to document the diversity of the known protein universe. It also provides a means to determine the functional roles of newly discovered protein sequences. Our goal is to predict the functional classification of novel protein sequences based on a set of features extracted from each protein sequence. The proposed technique used datasets extracted from the Structural Classification of Proteins (SCOP) database. A set of spectral domain features based on Fast Fourier Transform (FFT) is used. The proposed classifier uses multilayer back propagation (MLBP) neural network for protein classification. The maximum classification accuracy is about 91% when applying the classifier to the full four levels of the SCOP database. However, it reaches a maximum of 96% when limiting the classification to the family level. The classification results reveal that spectral domain contains information that can be used for classification with high accuracy. In addition, the results emphasize that sequence similarity measures are of great importance especially at the family level.