SENS PubMed Publication Search
Exploration of unsupervised feature selection methods to predict chronological age of individuals by utilising CpG dinucleotics from whole blood.
Conf Proc IEEE Eng Med Biol Soc. 2017 Jul;2017:3652-3655. doi: 10.1109/EMBC.2017.8037649
Sarac F, Seker H, Bouridane A
Abstract:
Identification of the age of individuals from epigenetic biomarkers can reveal vital information for criminal investigation, disease prevention, and extension of life. DNA methylation changes are highly associated with chronological age and the process of disease development. Computational methods such as clustering, feature selection and regression can be utilised to construct quantitative model of aging. In this study, we utilised 473034 CpG biomarkers from whole blood of 656 individuals aged 19 to 101 to construct predictive models and we treat the development of this age predictive model as extremely high-dimensional regression problem that is relatively understudied. Unlike semi-supervised and supervised feature selection methods, unsupervised feature selection methods are generally good at removing irrelevant features that can act as noise. In this study, along with the entire feature set, four different unsupervised feature selection methods (USFSMs) are therefore considered for the quantitative prediction of human ages. Since USFSMs are independent of any predictive method, support vector regression is then used to evaluate the prediction performances of the unsupervised feature selection methods. We proposed a novel k-means based unsupervised feature selection method to predict human ages by utilising CpG dinucleotides. Experimental results have validated the effectiveness of the proposed method as the optimum number of the CpG dinucleotides is found to be only 41 that corresponds to only 0.0087% of the entire feature space. To the best of our knowledge, this is the first study that presents exploration and comprehensive comparison of USFSMs in very high dimensional regression problems, particularly in epigenetic biomedical domain for the prediction of chronological age from changes in DNA methylation.
PMID: 29060690
Tags: biomarkers, epigenetics, methylation