In the beginning of every CCseqBasic run,
a RE digested form of the used reference genome is generated.
It takes ~ 15min to generate this genome.
To save a bit of run time, you can put these RE digested genome files in a folder,
to be re-used every time the pipeline is ran.
To save your reference RE cut genome, run the pipeline normally, for example :
CCseqBasic5.sh \ -c /t1-data/user/telenius/capturesiteREfragments.txt \ -s C57captureTest \ --pf /public/telenius/CAPTUREC_DATA/C57captureTest_run4 \ --genome mm9 \ --chunkmb 1012 \ --R1 /t1-data/user/telenius/R1_001.fastq \ --R2 /t1-data/user/telenius/R2_001.fastq \To turn on saving the RE cut genome , add flag
--saveGenomeDigestTo yield run command like :
CCseqBasic5.sh \ -c /t1-data/user/telenius/capturesiteREfragments.txt \ -s C57captureTest \ --pf /public/telenius/CAPTUREC_DATA/C57captureTest_run4 \ --genome mm9 \ --chunkmb 1012 \ --R1 /t1-data/user/telenius/R1_001.fastq \ --R2 /t1-data/user/telenius/R2_001.fastq \ --saveGenomeDigestThis command will generate RE-digested genome coordinate file called
genome_dpnII_coordinates.txt
( or genome_nlaIII_coordinates.txt or genome_hindIII_coordinates.txt , if you were running with --nla or --hind )
Here all supported RE cut enzymes :
RESTRICTION ENZYME SETTINGS --dpn (default) : dpnII is the RE of the experiment --nla : nlaIII is the RE of the experiment --hind : hindIII is the RE of the experimentThe genome_dpnII_coordinates.txt needs to be stored in a special folder mentioned in the CCseqBasic config file
conf/genomeBuildSetup.shin the line :
# ############################################################################# # GENOME DIGEST FILES for dpnII and nlaIII (optional - but makes runs faster) # ############################################################################# # To turn this off, set : # CaptureDigestPath="NOT_IN_USE" CaptureDigestPath="/home/molhaem2/telenius/CCseqBasic/digests"The folder structure of this 'digests' folder needs to be :
/home/molhaem2/telenius/CCseqBasic/digests |-- dpnII | |-- hg18.txt | |-- hg19.txt | |-- mm10.txt | `-- mm9.txt |-- hindIII | `-- hg19.txt `-- nlaIII |-- hg18.txt `-- mm9.txt
So, rename your generated genome_dpnII_coordinates.txt files accordingly,
and place them into the file structure.
Below provided readymade digests folder, in the correct structure,
for the following genomes and RE cut enzymes :
|-- dpnII | |-- hg18.txt | |-- hg19.txt | |-- mm10.txt | `-- mm9.txt |-- hindIII | `-- hg19.txt `-- nlaIII |-- hg18.txt `-- mm9.txt
These can be downloaded from here :
Its md5sum is :
017f3cbc29d14a0b0326c14a22b2403b digests.tar.gz
The md5sum as a file :