|
引入遗传因子的骨密度机器学习回归模型研究 |
Study on machine learning regression model of bone mineral density with genetic factors |
|
DOI:10.3969/j.issn.1006.7108.2022.10.010 |
中文关键词: 机器学习 骨质疏松症 骨密度 最大互信息系数 序列特征选择 |
英文关键词:machine learning osteoporosis bone mineral density maximal information coefficient sequential feature selection |
基金项目:国家自然科学基金(31301092) |
|
摘要点击次数: 509 |
全文下载次数: 0 |
中文摘要: |
目的 构建遗传因素及临床风险因素相结合的骨密度机器学习回归模型,识别影响个体对骨质疏松易感性的最优特征组合。方法 以最大互信息系数与序列浮动前向选择作为两阶段特征选择方法选择最优特征子集,基于随机森林建立骨密度回归模型。结果 在2 263例白种人样本数据集进行十折交叉验证实验,结果表明当包含51个SNP位点,6个临床特征时,两阶段特征选择方法结合随机森林模型的均方根误差最低为0.093 598 g/cm3,相较仅以临床危险因素作为特征时RMSE降低了5.36 %;与选用其他特征选择方法及回归模型的比较实验证实了本文提出模型的良好稳定性。结论 骨质疏松症致病因素分析方法能够发现隐藏的特征间相互作用,识别最优特征组合,从而更好地预测和诊断复杂疾病。 |
英文摘要: |
Objective Establish a machine learning regression model of bone mineral density that combines clinical risk factors and genetic factors to detect risk factors of osteoporosis. Methods The maximum information coefficient and sequential floating forward selection have been employed as a two-stage feature selection method in order to select the optimal feature subset to be used in with random forest algorithms for establishing the accurate regression model of bone mineral density. Results The experimental evaluation showed that the proposed two-stage feature selection method combined with random forest selects the optimal subset containing 51 causal SNPs and 6 clinical risk factors in the used dataset, achieving a low root mean square error of 0.093598 g/cm3, which was 5.36% lower than using clinical risk factors only. Experiments using other feature selection methods and regressors demonstrated the good stability of the proposed method. Conclusion The proposed method can detect the hidden interactions among factors and select the optimal subset of features to better predict and diagnose complex diseases. |
查看全文 查看/发表评论 下载PDF阅读器 |
关闭 |