Standard reference Genome-in-a-Bottle (GIAB) samples HG001-HG007


Last update: June 07, 2022


Description:

The reference data set was introduced in the paper “Cost-efficient whole genome-sequencing using novel mostly natural sequencing-by-synthesis chemistry and open fluidics platform,” including variant calling by a UG-adapted version of GATK.”

An additional variant calling analysis of the same dataset using a UG-adapted version of DeepVariant was described in the whitepaper “Adapting Google DeepVariant to Ultima Genomics Reads for Improved Variant Calling.”


Methods:

The seven standard GIAB reference samples HG001-HG007 were sequenced on a UG 100™ sequencer with a read-length of ~300bp. The data were base-called with base-calling pipeline version APL4.6, quality filtered by read quality (rq≤0.7) and randomly down-sampled to 40X (post de-duplication).


For variant-calling evaluation we excluded regions of homopolymers with length ≥11. Since low-complexity genomic regions tend to amplify inefficiently, we isolated the sequencing accuracy by further excluding selected low-complexity, tandem-repeats, and low mappability regions while still maintaining 98.2-98.5% of the original GIAB HCRs. The exclusion BED files are included below.


Included data files:

  • GIAB HG001-7 aligned data (CRAM) - available for download from AWS, see below
  • GIAB HG001-7 GATK variant calls (VCF)
  • GIAB HG001-7 DV variant calls (VCF)
  • HMER_11 exclusion BED file - [ug_lcr.HMER11.bed]
  • Full UG-LCR exclusion file (BED) - [ug_lcr.bed]
  • Readme file – Steps to reproduce the variant calling evaluation


CRAM and VCF download instructions:

The 7 GIAB (HG001-HG007) dataset is available on S3 bucket on AWS (total size: 444GB).

You can download the full dataset using the AWS CLI:


$ mkdir ultima-selected-1k-genomes
$ aws s3 cp s3://ultima-selected-1k-genomes ultima-selected-1k-genomes/ --recursive --no-sign-request

The CRAM reference can be downloaded here.


VCF only download instructions:

You can download just the VCF files for this dataset using the AWS CLI:


$ mkdir ultimage-selected-1k-genomes-vcf-only
$ aws s3 cp s3://ultima-selected-1k-genomes-vcf-only ultima-selected-1k-genomes-vcf-only/ --recursive --no-sign-request

Previous versions:

None

Request for information

Please fill out the below fields to access this document.