Hendri Karisma

← Web Docs← Dokumentasi

DNA Methylation Model for Lung Cancer Detection Using Machine LearningDNA Methylation Model for Lung Cancer Detection Using Machine Learning

ResearchPenelitian

DNA Methylation Model for Lung Cancer Detection Using Machine Learning

Status: draft (belum diproses di jurnal/konferensi) · Target: IJAAS

Abstract

Lung cancer has a relatively high incidence rate globally, with approximately 2.5 million cases according to the Global Cancer Observatory in 2022. In Indonesia, the mortality rate for lung cancer is 34,339 deaths out of 66,271 cases. The method commonly used for lung cancer screening is the Low-Dose CT-Scan, although it has low accuracy with a false positive rate of 22%–93%. DNA methylation as a biomarker has become a promising alternative. Several studies have been conducted, and it has relatively high accuracy in detecting and diagnosing lung cancer non-invasively.

In this study, an experiment was conducted using DNA methylation data modeling for early detection of lung cancer, using more than 450,000 CpG sites from various genes on 23 human chromosomes. The data came from NCBI GEO GSE66836 (Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital): 164 lung-cancer tumor positives and 19 normal samples.

Models & results

The model was built using three machine-learning methods:

MethodAccuracyDominant gene(s)
Decision Tree89%RAB3B (2 CpG sites)
XGBoost91%RAB3B (1 CpG site), AC006156.5/FAM197Y1 (1 CpG site)
Artificial Neural Network94%all CpG sites; top-5 genes: OTX1, HOOK2, MCIDAS, CDHR5, SCT

Read the full paper (PDF) and download the LaTeX source from the Publications page.

DNA Methylation Model for Lung Cancer Detection Using Machine Learning

Status: draft (belum diproses di jurnal/konferensi) · Target: IJAAS

Abstract

Lung cancer has a relatively high incidence rate globally, with approximately 2.5 million cases according to the Global Cancer Observatory in 2022. In Indonesia, the mortality rate for lung cancer is 34,339 deaths out of 66,271 cases. The method commonly used for lung cancer screening is the Low-Dose CT-Scan, although it has low accuracy with a false positive rate of 22%–93%. DNA methylation as a biomarker has become a promising alternative. Several studies have been conducted, and it has relatively high accuracy in detecting and diagnosing lung cancer non-invasively.

In this study, an experiment was conducted using DNA methylation data modeling for early detection of lung cancer, using more than 450,000 CpG sites from various genes on 23 human chromosomes. The data came from NCBI GEO GSE66836 (Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital): 164 lung-cancer tumor positives and 19 normal samples.

Models & results

The model was built using three machine-learning methods:

MethodAccuracyDominant gene(s)
Decision Tree89%RAB3B (2 CpG sites)
XGBoost91%RAB3B (1 CpG site), AC006156.5/FAM197Y1 (1 CpG site)
Artificial Neural Network94%all CpG sites; top-5 genes: OTX1, HOOK2, MCIDAS, CDHR5, SCT

Read the full paper (PDF) and download the LaTeX source from the Publications page.