[HTML][HTML] SomaticCombiner: improving the performance of somatic variant calling based on evaluation tests and a consensus approach

M Wang, W Luo, K Jones, X Bian, R Williams… - Scientific reports, 2020 - nature.com
M Wang, W Luo, K Jones, X Bian, R Williams, H Higson, D Wu, B Hicks, M Yeager, B Zhu
Scientific reports, 2020nature.com
It is challenging to identify somatic variants from high-throughput sequence reads due to
tumor heterogeneity, sub-clonality, and sequencing artifacts. In this study, we evaluated the
performance of eight primary somatic variant callers and multiple ensemble methods using
both real and synthetic whole-genome sequencing, whole-exome sequencing, and deep
targeted sequencing datasets with the NA12878 cell line. The test results showed that a
simple consensus approach can significantly improve performance even with a limited …
Abstract
It is challenging to identify somatic variants from high-throughput sequence reads due to tumor heterogeneity, sub-clonality, and sequencing artifacts. In this study, we evaluated the performance of eight primary somatic variant callers and multiple ensemble methods using both real and synthetic whole-genome sequencing, whole-exome sequencing, and deep targeted sequencing datasets with the NA12878 cell line. The test results showed that a simple consensus approach can significantly improve performance even with a limited number of callers and is more robust and stable than machine learning based ensemble approaches. To fully exploit the multi-callers, we also developed a software package, SomaticCombiner, that can combine multiple callers and integrates a new variant allelic frequency (VAF) adaptive majority voting approach, which can maintain sensitive detection for variants with low VAFs.
nature.com