Abstract
Machine learning is widely applied to gene expression profiles based molecular tumor classification, but sample imbalance problem is often overlooked. This paper proposed a subclass-weighted neighborhood classifier to address the imbalanced sample set problem and a novel neighborhood rough set model to select informative genes for classification performance improvement. Experiments on three publicly available tumor datasets demonstrated that the proposed method is obviously effective on imbalanced dataset with obscure boundary between two subtypes and informative gene selection and it can achieve higher cross-validation accuracy with much fewer tumor-related genes.
Original language | English (US) |
---|---|
Pages (from-to) | 259-273 |
Number of pages | 15 |
Journal | Journal of Circuits, Systems and Computers |
Volume | 19 |
Issue number | 1 |
DOIs | |
State | Published - Feb 2010 |
Externally published | Yes |
Keywords
- Gene expression profiles
- Imbalanced dataset
- Kruskal-Wallis rank sum test
- Molecular tumor classification
- Neighborhood rough set model
- Weighted neighborhood classifier
ASJC Scopus subject areas
- Hardware and Architecture
- Electrical and Electronic Engineering