TY - GEN
T1 - Neighborhood rough set model based gene selection for multi-subtype tumor classification
AU - Wang, Shulin
AU - Li, Xueling
AU - Zhang, Shanwen
PY - 2008
Y1 - 2008
N2 - Multi-subtype tumor diagnosis based on gene expression profiles is promising in clinical medicine application. Therefore, a great deal of research on tumor classification based on gene expression profiles has been developed, where various machine learning approaches were applied to constructing the best tumor classification model to improve the classification performance as much as possible. To achieve this goal, extracting features or finding informative genes that have good classification ability is crucial. We propose a novel gene selection approach, which adopts Kruskal-Wallis rank sum test to rank all genes and then apply an algorithm based on neighborhood rough set model to gene reduction to obtain gene subsets with fewer genes and more classification ability. Experiments on a small round blue cell tumor (SRBCT) dataset show that our approach can achieve very high classification accuracy with only three or four genes as evaluated by three classifiers: support vector machines, K-nearest neighbor and neighborhood classifier, respectively.
AB - Multi-subtype tumor diagnosis based on gene expression profiles is promising in clinical medicine application. Therefore, a great deal of research on tumor classification based on gene expression profiles has been developed, where various machine learning approaches were applied to constructing the best tumor classification model to improve the classification performance as much as possible. To achieve this goal, extracting features or finding informative genes that have good classification ability is crucial. We propose a novel gene selection approach, which adopts Kruskal-Wallis rank sum test to rank all genes and then apply an algorithm based on neighborhood rough set model to gene reduction to obtain gene subsets with fewer genes and more classification ability. Experiments on a small round blue cell tumor (SRBCT) dataset show that our approach can achieve very high classification accuracy with only three or four genes as evaluated by three classifiers: support vector machines, K-nearest neighbor and neighborhood classifier, respectively.
KW - Gene expression profiles
KW - K-nearest neighbor
KW - Neighborhood classifier
KW - Support vector machines
KW - Tumor classification
UR - http://www.scopus.com/inward/record.url?scp=56549111659&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=56549111659&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-87442-3_20
DO - 10.1007/978-3-540-87442-3_20
M3 - Conference contribution
AN - SCOPUS:56549111659
SN - 3540874402
SN - 9783540874409
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 146
EP - 158
BT - Advanced Intelligent Computing Theories and Applications
T2 - 4th International Conference on Intelligent Computing, ICIC 2008
Y2 - 15 September 2008 through 18 September 2008
ER -