SOAPdenovo2

Ruibang Luo;Binghang Liu;Yinlong Xie;Zhenyu Li;Weihua Huang;Jianying Yuan;Guangzhu He;Yanxiang Chen;Qi Pan;Yunjie Liu;Jingbo Tang;Gengxiong Wu;Hao Zhang;Yujian Shi;Yong Liu;Chang Yu;Bo Wang;Yao Lu;Changlei Han;David W. Cheung;Siu Ming Yiu;Shaoliang Peng;Zhu Xiaoqian;Guangming Liu;湘科 廖;Yingrui Li;焕明 杨;Jian Wang;Tak Wah Lam;军 王

BGI HK Ltd.;The University of Hong Kong;South China University of Technology;National University of Defense Technology

发表时间:2012-12-27

期 刊:GigaScience

语 言:English

U R L: http://www.scopus.com/inward/record.url?scp=84942887758&partnerID=8YFLogxK

摘要

Background: There is a rapidly increasing amount of de novo genome assembly using next-generation sequencing (NGS) short reads; however, several big challenges remain to be overcome in order for this to be efficient and accurate. SOAPdenovo has been successfully applied to assemble many published genomes, but it still needs improvement in continuity, accuracy and coverage, especially in repeat regions.Findings: To overcome these challenges, we have developed its successor, SOAPdenovo2, which has the advantage of a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closing, and optimizes for large genome. Conclusions: Benchmark using the Assemblathon1 and GAGE datasets showed that SOAPdenovo2 greatly surpasses its predecessor SOAPdenovo and is competitive to other assemblers on both assembly length and accuracy. We also provide an updated assembly version of the 2008 Asian (YH) genome using SOAPdenovo2. Here, the contig and scaffold N50 of the YH genome were ~20.9 kbp and ~22 Mbp, respectively, which is 3-fold and 50-fold longer than the first published version. The genome coverage increased from 81.16% to 93.91%, and memory consumption was ~2/3 lower during the point of largest memory consumption.

关键词

Assembly
Contig
Error correction
Gap-filling
Genome
Scaffold

相关科学

计算机科学
计算机科学应用
医学
健康信息学

文献指纹

工程与材料科学

Genes

Data storage equipment

Scaffolds

医学与生命科学

Memory

Genome

Benchmarking

Datasets

被引量

期刊度量

Scopus度量

年份 CiteScore SJR SNIP
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013 3.3 1.561 0.789
2014 4.1 5.565 2.458
2015 6.5 4.727 1.793
2016 9.9 5.068 1.881
2017 9.2 5.022 1.857
2018 8.1 4.726 2.143
2019 6.4 2.639 1.69
2020 8.4

相似文献推荐