Updated by Jelena - 16:00 02/Sep/2015

CaptureC test data




This data is C57bl6 data from the "30 genes" capture set.

There is 30 different capture sites, here given 3 genes out of these (in the format of the captureC oligo file) :

The smaller files are R1 R2 from a miSeq run, and the larger files R1 R2 from hiSeq run, for the same sample.

There are captures from several genes in the sample - I mention here alpha globin and mitoferrin

Alpha globin locus is duplicated, which leads to "duplication" in the oligo coordinate file.
So - we have only one probe, but it is exactly twice in the genome - leading to need to write the oligo twice.

and its oligo coordinate file is :

Hba1 11 32182969 32183821 11 32181969 32184821 1 A
Hba2 11 32195804 32196638 11 32194804 32197638 2 A

(the last 2 columns don't matter, of course, as SNP-specific run is very rare need).

Here a single probe "normal gene" capture - mitoferrin locus in chromosome 14 :

Slc25A37 14 69902454 69903469 14 69901454 69904469 9 A

So, you can feed this into the pipe as a single oligo coordinate file like this (tab-delimited), NO empty line in the end of file :

Hba1 11 32182969 32183821 11 32181969 32184821 1 A
Hba2 11 32195804 32196638 11 32194804 32197638 2 A
Slc25A37 14 69902454 69903469 14 69901454 69904469 9 A



MiSeq samples :

30gmC1_S6_L001_R1_001.fastq

30gmC1_S6_L001_R2_001.fastq



HiSeq samples :

30ghC1_S6_1.fastq

30ghC1_S6_2.fastq