Introduction

The human genome is arguably the most complete mammalian reference assembly, yet more than 160 euchromatic gaps remain and aspects of its structural variation remain poorly understood ten years after its completion. To identify missing sequence and genetic variation, here we sequence and analyse a haploid human genome (CHM1) using single-molecule, real-time DNA sequencing. We close or extend 55% of the remaining interstitial gaps in the human GRCh37 reference genome--78% of which carried long runs of degenerate short tandem repeats, often several kilobases in length, embedded within (G+C)-rich genomic regions. We resolve the complete sequence of 26,079 euchromatic structural variants at the base-pair level, including inversions, complex insertions and long tracts of tandem repeats. Most have not been previously reported, with the greatest increases in sensitivity occurring for events less than 5 kilobases in size. Compared to the human reference, we find a significant insertional bias (3:1) in regions corresponding to complex insertions and long short tandem repeats. Our results suggest a greater complexity of the human genome in the form of variation of longer and more complex repetitive DNA that can now be largely resolved with the application of this longer-read sequencing technology.

Publications

  1. Resolving the complexity of the human genome using single-molecule sequencing.
    Cite this
    Chaisson MJ, Huddleston J, Dennis MY, Sudmant PH, Malig M, Hormozdiari F, Antonacci F, Surti U, Sandstrom R, Boitano M, Landolin JM, Stamatoyannopoulos JA, Hunkapiller MW, Korlach J, Eichler EE, 2015-01-01 - Nature

Credits

  1. Mark J P Chaisson
    Developer

    Department of Genome Sciences, University of Washington School of Medicine, United States of America

  2. John Huddleston
    Developer

    1] Department of Genome Sciences, University of Washington School of Medicine, United States of America

  3. Megan Y Dennis
    Developer

    Department of Genome Sciences, University of Washington School of Medicine, United States of America

  4. Peter H Sudmant
    Developer

    Department of Genome Sciences, University of Washington School of Medicine, United States of America

  5. Maika Malig
    Developer

    Department of Genome Sciences, University of Washington School of Medicine, United States of America

  6. Fereydoun Hormozdiari
    Developer

    Department of Genome Sciences, University of Washington School of Medicine, United States of America

  7. Francesca Antonacci
    Developer

    Dipartimento di Biologia, Università degli Studi di Bari 'Aldo Moro', Italy

  8. Urvashi Surti
    Developer

    Department of Pathology, University of Pittsburgh, United States of America

  9. Richard Sandstrom
    Developer

    Department of Genome Sciences, University of Washington School of Medicine, United States of America

  10. Matthew Boitano
    Developer

    Pacific Biosciences of California, Inc., United States of America

  11. Jane M Landolin
    Developer

    Pacific Biosciences of California, Inc., United States of America

  12. John A Stamatoyannopoulos
    Developer

    Department of Genome Sciences, University of Washington School of Medicine, United States of America

  13. Michael W Hunkapiller
    Developer

    Pacific Biosciences of California, Inc., United States of America

  14. Jonas Korlach
    Developer

    Pacific Biosciences of California, Inc., United States of America

  15. Evan E Eichler
    Investigator

    1] Department of Genome Sciences, University of Washington School of Medicine, United States of America

Community Ratings

UsabilityEfficiencyReliabilityRated By
0 user
Sign in to rate
Summary
AccessionBT002229
Tool TypeApplication
Category
PlatformsLinux/Unix
TechnologiesC++, Perl
User InterfaceTerminal Command Line
Download Count0
Country/RegionUnited States of America
Submitted ByEvan E Eichler