In the beginning of every CCseqBasic run,
a RE digested form of the used reference genome is generated.
It takes ~ 15min to generate this genome.
To save a bit of run time, you can put these RE digested genome files in a folder,
to be re-used every time the pipeline is ran.
To save your reference RE cut genome, run the pipeline normally, for example :
CCseqBasic5.sh \
-c /t1-data/user/telenius/capturesiteREfragments.txt \
-s C57captureTest \
--pf /public/telenius/CAPTUREC_DATA/C57captureTest_run4 \
--genome mm9 \
--chunkmb 1012 \
--R1 /t1-data/user/telenius/R1_001.fastq \
--R2 /t1-data/user/telenius/R2_001.fastq \
To turn on saving the RE cut genome , add flag
--saveGenomeDigestTo yield run command like :
CCseqBasic5.sh \
-c /t1-data/user/telenius/capturesiteREfragments.txt \
-s C57captureTest \
--pf /public/telenius/CAPTUREC_DATA/C57captureTest_run4 \
--genome mm9 \
--chunkmb 1012 \
--R1 /t1-data/user/telenius/R1_001.fastq \
--R2 /t1-data/user/telenius/R2_001.fastq \
--saveGenomeDigest
This command will generate RE-digested genome coordinate file called
genome_dpnII_coordinates.txt
( or genome_nlaIII_coordinates.txt or genome_hindIII_coordinates.txt , if you were running with --nla or --hind )
Here all supported RE cut enzymes :
RESTRICTION ENZYME SETTINGS --dpn (default) : dpnII is the RE of the experiment --nla : nlaIII is the RE of the experiment --hind : hindIII is the RE of the experimentThe genome_dpnII_coordinates.txt needs to be stored in a special folder mentioned in the CCseqBasic config file
conf/genomeBuildSetup.shin the line :
# ############################################################################# # GENOME DIGEST FILES for dpnII and nlaIII (optional - but makes runs faster) # ############################################################################# # To turn this off, set : # CaptureDigestPath="NOT_IN_USE" CaptureDigestPath="/home/molhaem2/telenius/CCseqBasic/digests"The folder structure of this 'digests' folder needs to be :
/home/molhaem2/telenius/CCseqBasic/digests
|-- dpnII
| |-- hg18.txt
| |-- hg19.txt
| |-- mm10.txt
| `-- mm9.txt
|-- hindIII
| `-- hg19.txt
`-- nlaIII
|-- hg18.txt
`-- mm9.txt
So, rename your generated genome_dpnII_coordinates.txt files accordingly,
and place them into the file structure.
Below provided readymade digests folder, in the correct structure,
for the following genomes and RE cut enzymes :
|-- dpnII
| |-- hg18.txt
| |-- hg19.txt
| |-- mm10.txt
| `-- mm9.txt
|-- hindIII
| `-- hg19.txt
`-- nlaIII
|-- hg18.txt
`-- mm9.txt
These can be downloaded from here :
Its md5sum is :
017f3cbc29d14a0b0326c14a22b2403b digests.tar.gz
The md5sum as a file :