~ re-use the RE digested genome files

In the beginning of every CCseqBasic run,
a RE digested form of the used reference genome is generated.

It takes ~ 15min to generate this genome.

To save a bit of run time, you can put these RE digested genome files in a folder,
to be re-used every time the pipeline is ran.

To save your reference RE cut genome, run the pipeline normally, for example :

CCseqBasic5.sh \
    -c /t1-data/user/telenius/capturesiteREfragments.txt \
    -s C57captureTest \
    --pf /public/telenius/CAPTUREC_DATA/C57captureTest_run4 \
    --genome mm9 \
    --chunkmb 1012 \
    --R1 /t1-data/user/telenius/R1_001.fastq \
    --R2 /t1-data/user/telenius/R2_001.fastq \

To turn on saving the RE cut genome , add flag

--saveGenomeDigest

To yield run command like :

CCseqBasic5.sh \
    -c /t1-data/user/telenius/capturesiteREfragments.txt \
    -s C57captureTest \
    --pf /public/telenius/CAPTUREC_DATA/C57captureTest_run4 \
    --genome mm9 \
    --chunkmb 1012 \
    --R1 /t1-data/user/telenius/R1_001.fastq \
    --R2 /t1-data/user/telenius/R2_001.fastq \
    --saveGenomeDigest

This command will generate RE-digested genome coordinate file called

genome_dpnII_coordinates.txt

( or genome_nlaIII_coordinates.txt or genome_hindIII_coordinates.txt , if you were running with --nla or --hind )

Here all supported RE cut enzymes :

RESTRICTION ENZYME SETTINGS
--dpn  (default) : dpnII is the RE of the experiment
--nla  : nlaIII is the RE of the experiment
--hind : hindIII is the RE of the experiment

The genome_dpnII_coordinates.txt needs to be stored in a special folder mentioned in the CCseqBasic config file

conf/genomeBuildSetup.sh

in the line :

# #############################################################################
# GENOME DIGEST FILES for dpnII and nlaIII (optional - but makes runs faster)
# #############################################################################

# To turn this off, set :
# CaptureDigestPath="NOT_IN_USE"

CaptureDigestPath="/home/molhaem2/telenius/CCseqBasic/digests"

The folder structure of this 'digests' folder needs to be :

/home/molhaem2/telenius/CCseqBasic/digests
|-- dpnII
|   |-- hg18.txt
|   |-- hg19.txt
|   |-- mm10.txt
|   `-- mm9.txt
|-- hindIII
|   `-- hg19.txt
`-- nlaIII
    |-- hg18.txt
    `-- mm9.txt

So, rename your generated genome_dpnII_coordinates.txt files accordingly,
and place them into the file structure.

Below provided readymade digests folder, in the correct structure,
for the following genomes and RE cut enzymes :

|-- dpnII
|   |-- hg18.txt
|   |-- hg19.txt
|   |-- mm10.txt
|   `-- mm9.txt
|-- hindIII
|   `-- hg19.txt
`-- nlaIII
    |-- hg18.txt
    `-- mm9.txt

These can be downloaded from here :

digests.tar.gz

Its md5sum is :

017f3cbc29d14a0b0326c14a22b2403b  digests.tar.gz

The md5sum as a file :

md5sum_digests_targz.txt