Abstract
Identifying protein-protein interactions (PPIs) is critical for understanding the cellular function of the proteins and the machinery of a proteome. Data of PPIs derived from high-throughput technologies are often incomplete and noisy. Therefore, it is important to develop computational methods and high-quality interaction dataset for predicting PPIs. A sequence-based method is proposed by combining correlation coefficient (CC) transformation and support vector machine (SVM). CC transformation not only adequately considers the neighboring effect of protein sequence but describes the level of CC between two protein sequences. A gold standard positives (interacting) dataset MIPS Core and a gold standard negatives (non-interacting) dataset GO-NEG of yeast Saccharomyces cerevisiae were mined to objectively evaluate the above method and attenuate the bias. The SVM model combined with CC transformation yielded the best performance with a high accuracy of 87.94% using gold standard positives and gold standard negatives datasets. The source code of MATLAB and the datasets are available on request under [email protected].
Original language | English (US) |
---|---|
Pages (from-to) | 891-899 |
Number of pages | 9 |
Journal | Amino Acids |
Volume | 38 |
Issue number | 3 |
DOIs | |
State | Published - Mar 2010 |
Externally published | Yes |
Keywords
- Correlation coefficient
- Gold standard negatives dataset
- Gold standard positives dataset
- Protein sequence
- Protein-protein interactions
- Support vector machine
ASJC Scopus subject areas
- Biochemistry
- Clinical Biochemistry
- Organic Chemistry