Updated by Jelena - 12:00 25/Jun/2018

CaptureC test data


This data is C57bl6 data from the "30 genes" capture set.

genome : mm9 or mm10 (i.e. mouse), restriction enzyme DpnII


There is 30 different capture sites, here given 3 genes out of these (in the format of the captureC oligo file) :

The smaller files are R1 R2 from a miSeq run, and the larger files R1 R2 from hiSeq run, for the same sample.

There are captures from several genes in the sample - I mention here alpha globin and mitoferrin

Alpha globin locus is duplicated, which leads to "duplication" in the oligo coordinate file.
So - we have only one probe, but it is exactly twice in the genome - leading to need to write the oligo twice.

and its oligo coordinate file is :

TAB-delimited, no emptyline in the end of file

Hba1   11      32182969        32183821        11      32181969        32184821        1       A
Hba2   11      32195804        32196638        11      32194804        32197638        2       A
(the last 2 columns don't matter, of course, as SNP-specific run is very rare need).

Here a single probe "normal gene" capture - mitoferrin locus in chromosome 14 :
Slc25A37        14      69902454        69903469        14      69901454        69904469        9       A

So, you can feed this into the pipe as a single oligo coordinate file like this (tab-delimited), NO empty line in the end of file :
Hba1   11      32182969        32183821        11      32181969        32184821        1       A
Hba2   11      32195804        32196638        11      32194804        32197638        2       A
Slc25A37        14      69902454        69903469        14      69901454        69904469        9       A


MiSeq samples :

30gmC1_S6_L001_R1_001.fastq

30gmC1_S6_L001_R2_001.fastq



HiSeq samples :

30ghC1_S6_1.fastq

30ghC1_S6_2.fastq