International Science Index


Mining Genes Relations in Microarray Data Combined with Ontology in Colon Cancer Automated Diagnosis System

Abstract:MATCH project [1] entitle the development of an automatic diagnosis system that aims to support treatment of colon cancer diseases by discovering mutations that occurs to tumour suppressor genes (TSGs) and contributes to the development of cancerous tumours. The constitution of the system is based on a) colon cancer clinical data and b) biological information that will be derived by data mining techniques from genomic and proteomic sources The core mining module will consist of the popular, well tested hybrid feature extraction methods, and new combined algorithms, designed especially for the project. Elements of rough sets, evolutionary computing, cluster analysis, self-organization maps and association rules will be used to discover the annotations between genes, and their influence on tumours [2]-[11]. The methods used to process the data have to address their high complexity, potential inconsistency and problems of dealing with the missing values. They must integrate all the useful information necessary to solve the expert's question. For this purpose, the system has to learn from data, or be able to interactively specify by a domain specialist, the part of the knowledge structure it needs to answer a given query. The program should also take into account the importance/rank of the particular parts of data it analyses, and adjusts the used algorithms accordingly.
[2] Pawlak Z. (1982) Rough sets. International Journal of Information and Computer Sciences, 11(5):341-356.
[3] Pawlak Z. and Slowinski. R. (1994) Rough set approach to multiattribute decision analysis. European Journal of Operational Research, 72(3):443-459.
[4] Slezak D. (2005) Association Reducts: A Framework for Mining Multiattribute Dependencies. ISMIS 2005: 354-363.
[5] Wroblewski J. (1996) Theoretical Foundations of Order-Based Genetic Algorithms. Fundam. Inform. 28(3-4): 423-430.
[6] Wroblewski:J., Slezak D. (2003) Order Based Genetic Algorithms for the Search of Approximate Entropy Reducts. RSFDGrC 2003: 308-311.
[7] Yao H., Hamilton H.J., Butz C.J. (2004) A Foundational Approach to Mining Itemset Utilities from Databases. SDM 2004.
[8] Yao J.T., Yao Y.Y., and Zhao, Y. (2005) Foundations of classification, in: Lin, T.Y., Ohsuga, S., Liau, C.J. and Hu, X. (Eds), Foundations and Novel Approaches in Data Mining, Springer, Berlin, pp. 75-97.
[9] Yao Y.Y., Zhong, N. and Zhao, Y.(2004) A three-layered conceptual framework of data mining, Proceedings of ICDM'04 Workshop of Foundation of Data Mining, 215-221.
[10] Ziarko, W. (1989) A technique for discovering and analysis of causeeffect relationships in empirical data. International Joint Conference on Artificial Intelligence, Proceedings of the Workshop on Knowledge Discovery in Databases, Detroit, p.390-396.
[11] Ziarko, W. (1989) Determination of locally optimal set of features for representation of implicit knowledge. Proceedings of International Conference on Computing and Information, Toronto, North Holland, p.433-438.
[12] Baskin C., García-Sastre A., Tumpey T. (2004) Integration of Clinical Data, Pathology, and cDNA Microarrays in Influenza Virus-Infected Pigtailed Macaques Journal of Virology, October 2004, p. 10420-10432, Vol. 78, No. 19
[13] Casey R. M. (2005) Bioinformatics Data Integration. Business Intelligence Network
[14] Pasquier, C. et al. THEA: ontology-driven analysis of microarray data. Pasquier, C. et al. Bioinformatics 20(16), 2636-2643, 2004.
[15] Radetzki, U., Bode, T., Witterstein, G., Gnasa et al. (2003) A Service- Centric Computing Environment for Heterogeneous Biological Databases and Methods." In R. Spang, P. Beziat, and M. Vingron (eds.): Currents in Computational Molecular Biology (RECOMB 2003), pp. 25- 26, April 2003, Berlin, Germany.
[16] Burger, M., Graepel, T., Obermayer, K.: Self-organizing maps: Generalizations and new optimization techniques. Neurocomputing 20 (1998) pp. 173-190.
[17] Kohonen, T.: Self-organized formation of topologically correct feature maps. Bio-logical Cybernetics 43 (1982) pp. 59-69.
[18] Gruzdz, A.,Ihnatowicz, A., Slezak, D.: Interactive gene clustering-A case study of breast cancer microarray data. Information Systems Frontiers (2006) 8:21-27.