Updated by Jelena - 12:00 25/Jun/2018
CaptureC test data
This data is C57bl6 data from the "30 genes" capture set.
genome : mm9 or mm10 (i.e. mouse), restriction enzyme DpnII
There is 30 different capture sites,
here given 3 genes out of these (in the format of the captureC oligo file) :
The smaller files are R1 R2 from a miSeq run, and the larger files R1 R2 from hiSeq run, for the same sample.
There are captures from several genes in the sample - I mention here alpha globin and mitoferrin
Alpha globin locus is duplicated, which leads to "duplication" in the oligo coordinate file.
So - we have only one probe, but it is exactly twice in the genome - leading to need to write the oligo twice.
and its oligo coordinate file is :
TAB-delimited, no emptyline in the end of file
Hba1 11 32182969 32183821 11 32181969 32184821 1 A
Hba2 11 32195804 32196638 11 32194804 32197638 2 A
(the last 2 columns don't matter, of course, as SNP-specific run is very rare need).
Here a single probe "normal gene" capture - mitoferrin locus in chromosome 14 :
Slc25A37 14 69902454 69903469 14 69901454 69904469 9 A
So, you can feed this into the pipe as a single oligo coordinate file like this (tab-delimited), NO empty line in the end of file :
Hba1 11 32182969 32183821 11 32181969 32184821 1 A
Hba2 11 32195804 32196638 11 32194804 32197638 2 A
Slc25A37 14 69902454 69903469 14 69901454 69904469 9 A
MiSeq samples :
HiSeq samples :