Page updated by Jelena Telenius - 17:00 28/Nov/2018

CCseqBasic

~ re-use the RE digested genome files




In the beginning of every CCseqBasic run,
a RE digested form of the used reference genome is generated.

It takes ~ 15min to generate this genome.

To save a bit of run time, you can put these RE digested genome files in a folder,
to be re-used every time the pipeline is ran.

To save your reference RE cut genome, run the pipeline normally, for example :

CCseqBasic5.sh \
    -c /t1-data/user/telenius/capturesiteREfragments.txt \
    -s C57captureTest \
    --pf /public/telenius/CAPTUREC_DATA/C57captureTest_run4 \
    --genome mm9 \
    --chunkmb 1012 \
    --R1 /t1-data/user/telenius/R1_001.fastq \
    --R2 /t1-data/user/telenius/R2_001.fastq \
To turn on saving the RE cut genome , add flag
--saveGenomeDigest
To yield run command like :
CCseqBasic5.sh \
    -c /t1-data/user/telenius/capturesiteREfragments.txt \
    -s C57captureTest \
    --pf /public/telenius/CAPTUREC_DATA/C57captureTest_run4 \
    --genome mm9 \
    --chunkmb 1012 \
    --R1 /t1-data/user/telenius/R1_001.fastq \
    --R2 /t1-data/user/telenius/R2_001.fastq \
    --saveGenomeDigest 
This command will generate RE-digested genome coordinate file called
genome_dpnII_coordinates.txt

( or genome_nlaIII_coordinates.txt or genome_hindIII_coordinates.txt , if you were running with --nla or --hind )

Here all supported RE cut enzymes :

RESTRICTION ENZYME SETTINGS
--dpn  (default) : dpnII is the RE of the experiment
--nla  : nlaIII is the RE of the experiment
--hind : hindIII is the RE of the experiment
The genome_dpnII_coordinates.txt needs to be stored in a special folder mentioned in the CCseqBasic config file
conf/genomeBuildSetup.sh
in the line :
# #############################################################################
# GENOME DIGEST FILES for dpnII and nlaIII (optional - but makes runs faster)
# #############################################################################

# To turn this off, set :
# CaptureDigestPath="NOT_IN_USE"

CaptureDigestPath="/home/molhaem2/telenius/CCseqBasic/digests"
The folder structure of this 'digests' folder needs to be :
/home/molhaem2/telenius/CCseqBasic/digests
|-- dpnII
|   |-- hg18.txt
|   |-- hg19.txt
|   |-- mm10.txt
|   `-- mm9.txt
|-- hindIII
|   `-- hg19.txt
`-- nlaIII
    |-- hg18.txt
    `-- mm9.txt

So, rename your generated genome_dpnII_coordinates.txt files accordingly,
and place them into the file structure.

Below provided readymade digests folder, in the correct structure,
for the following genomes and RE cut enzymes :

|-- dpnII
|   |-- hg18.txt
|   |-- hg19.txt
|   |-- mm10.txt
|   `-- mm9.txt
|-- hindIII
|   `-- hg19.txt
`-- nlaIII
    |-- hg18.txt
    `-- mm9.txt

These can be downloaded from here :

  • digests.tar.gz

    Its md5sum is :

    017f3cbc29d14a0b0326c14a22b2403b  digests.tar.gz
    

    The md5sum as a file :

  • md5sum_digests_targz.txt