International Science Index


Comparison of Domain and Hydrophobicity Features for the Prediction of Protein-Protein Interactions using Support Vector Machines


The protein domain structure has been widely used as the most informative sequence feature to computationally predict protein-protein interactions. However, in a recent study, a research group has reported a very high accuracy of 94% using hydrophobicity feature. Therefore, in this study we compare and verify the usefulness of protein domain structure and hydrophobicity properties as the sequence features. Using the Support Vector Machines (SVM) as the learning system, our results indicate that both features achieved accuracy of nearly 80%. Furthermore, domains structure had receiver operating characteristic (ROC) score of 0.8480 with running time of 34 seconds, while hydrophobicity had ROC score of 0.8159 with running time of 20,571 seconds (5.7 hours). These results indicate that protein-protein interaction can be predicted from domain structure with reliable accuracy and acceptable running time.

[1] B. Rost, J. Liu, R. Nair, K. O. Wrzeszczynski, and Y. Ofran, "Automatic prediction of protein function," Cell. Mol. Life Sci. vol. 60, pp. 2637-2650, 2003.
[2] H. Lodish, A. Berk, L. Zipursky, P. Matsudaira, D. Baltimore, and J. Darnell, Molecular cell biology (4th edition). W.H. Freeman, New York, 2000.
[3] B. Alberts, A. Johnson, J. Lewis, M. Raff, K.Roberts, and P. Walter, Molecular Biology of the Cell (4th edition). Garland Science, 2002.
[4] P. Uetz and C. S. Vollert, "Protein-Protein Interactions," Encyclopedic Reference of Genomics and Proteomics in Molecular Medicine (ERGPMM), Springer Verlag, 2005.
[5] E. M. Phizicky and S. Fields, "Protein-protein interactions: Method for detection and analysis," Microbiological Reviews, pp.94-123, 1995.
[6] E. M. Marcotte, M. Pellegrini, M. J. Thompson, T. O. Yeates, and D. Eisenberg, "A combined algorithm for genome-wide prediction of protein function," Nature, vol. 402, pp:83-86, 1999.
[7] M. Pellegrini, E. M. Marcotte, M. J. Thompson, D. Eisenberg, and T. O. Yeates, "Assigning protein functions by comparative genome analysis: protein phylogenetic profiles," In the proceedings of National Academy of Sciences, USA, vol. 96, pp. 4285-4288, 1999.
[8] F. Pazos and A. Valencia, "Similarity of phylogenetic trees as indicator of protein-protein interaction," Protein Engineering, vol. 14(9), pp: 609- 614, 2001.
[9] A. J. Enright, I. N. Ilipoulos, C. Kyrpides, and C. A. Ouzounis, "Protein interaction maps for complete genomes based on gene fusion events," Nature, vol. 402, pp: 86-90, 1999.
[10] D. Eisenberg, E. M. Marcotte, I. Xenarios, and T. O. Yeates, "Protein function in the post-genomic era," Nature, vol. 405, pp: 823-826, 2000.
[11] J. Wojcik and V. Schachter, "Protein-Protein interaction map inference using interacting domain profile pairs," Bioinformatics, vol. 17, pp:S296-S305, 2001.
[12] J. R. Bock and D. A. Gough, "Predicting protein-protein interactions from primary structure," Bioinformatics, vol. 17(5), pp: 455-460, 2001.
[13] T. Oyama, K. Kitano, K. Satou, and T. Ito, "Extraction of knowledge on protein-protein interaction by association rule discovery," Bioinformatics, vol. 18(5), pp: 705-714, 2002.
[14] T. Pawson and P. Nash, "Assembly of cell regulatory systems through protein interaction domains," Science, vol. 300, pp: 445-452, 2003.
[15] W. K. Kim, J. Park, and J. K. Suh, "Large scale statistical prediction of protein-protein interaction by potentially interacting domain (PID) pair," Genome Informatics, vol. 13, pp: 42-50, 2002.
[16] S. M. Gomez, W. S. Noble, and A. Rzhetsky, "Learning to predict protein-protein interactions from protein sequences," Bioinformatics, vol. 19(15), pp: 1875-1881, 2003.
[17] I. Xenarios, L. Salwinski, X. J. Duan, P. Higney, S. M. Kim, and D. Eisenberg, "DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions," Nucleic Acids Research, vol. 30(1), pp: 303- 305, 2002.
[18] Y. Chung, G. Kim, Y. Hwang, and H. Park, "Predicting Protein-Protein Interactions from One Feature Using SVM," In proceedings of IEA/AIE pp:50-55, 2004.
[19] V. N. Vapnik, The Nature of Statistical Learning Theory. Springer. 1995.
[20] S. K. Ng, Z. Zhang, S. H. Tan, and K. Lin, "InterDom: a database of putative interacting protein domains for validating predicted protein interactions and complexes," Nucleic Acids Research, vol. 31, pp: 251- 254, 2003.
[21] A. Bateman, L. Coin, R. Durbin, R. D. Finn, V. Hollich, S. Griffiths- Jones, A. Khanna, M. Marshall, S. Moxon, E. L. Sonnhammer, D.J. Studholme, C. Yeats, and S. R. Eddy, "The Pfam: Protein Families Database," Nucleic Acids Research: Database Issue, vol. 32, pp: D138- D141, 2004.
[22] T. P. Hopp and K. R. Woods, "Predicting of protein antigenic determinants from amino acid sequences," Proc. Natl Acad. Sci. USA, 78, 3824-3828, 1981.
[23] C. M. Deane, L. Salwinski, I. Xenarios, and D. Eisenberg, "Protein interactions: two methods for assessment of the reliability of high throughput observations," Molecular & Cellular Proteomics, vol. 1(5), pp: 349-56, 2002.
[24] Hong EL, Balakrishnan R, Christie KR, Costanzo MC, Dwight SS, Engel SR, Fisk DG, Hirschman JE, Livestone MS, Nash R, Park J, Oughtred R, Skrzypek M, Starr B, Theesfeld CL, Andrada R, Binkley G, Dong Q, Lane C, Hitz B, Miyasato S, Schroeder M, Sethuraman A, Weng S, Dolinski K, Botstein D, and Cherry JM. "Saccharomyces Genome Database", (10th Oct 2005).
[25] N. J. Mulder, R. Apweiler, T. K. Attwood, A. Bairoch, D. Barrell, A. Bateman, D. Binns, et al., "The InterPro Database brings increased coverage and new features," Nucleic Acids Research, vol. 31, pp: 315- 318, 2003.
[26] C. C. Chang and C. J. Lin, "LIBSVM : a library for support vector machines," 2001. Software available at (24th March 2005).