CaptureC analysis pipeline website

CaptureC analysis codes - CCanalyser, blat filtering, and statistics originally coded by James Davies.
Developed to more mature and pipelined format by Jelena Telenius.
Statistical analysis modified by Marieke Oudelaar, and Damien Downes.

Alpha testers (read "the heroes" ) : Joke Van Bemmel, Duantida Songdej, Matthew Gosden, Mira Kassouf, Lars Hanssen, Ross Thorne
Beta testers : Jessica Davies, Anna Sanniti, Nigel Roberts, Nicolas Servant

***************************************************************************************

General documentation :

Analysis workflow :

Generate oligo coordinate file
Run pipeline
Interpret your results
Normalise your tracks (compare between active and inactive tissue)
Pool your samples (check that all your replicates look the same)
Statistical analysis (get p-values for your comparison between active and inactive tissue)

***************************************************************************************

Pipeline run instructions

updated by Jelena - 16:50 08/Oct/2018

CITING the pipeline in your publications
If you use the pipeline in your publication - add Jelena Telenius to acknowledgements.
If you have received a lot of help and development in your analysis - you should consider authorship :)

See below for :

Current release
Example run script
Example run command
Pipeline manuals

CM5 - CCseqBasic5 ("First multithreaded CC5") Parallel runs. Released 08Oct2018. Current development version.

/t1-data/data/hugheslab/jelenatools/CCseqBasic/CM5/pipeRainbow.sh --help

/t1-data/data/hugheslab/jelenatools/CCseqBasic/CM5/pipeRainbow.sh -h

CS5 - CCseqBasic5 ("Stable portable CC5") Fully portable. Released 17Nov2017. Only receiving minor updates and bug fixes.

/t1-data/data/hugheslab/jelenatools/CCseqBasic/CS5/pipe.sh --help

/t1-data/data/hugheslab/jelenatools/CCseqBasic/CS5/pipe.sh -h

Bug fixes and updates (after release)

Current issues - known bugs and development targets !


To compare old samples to newly ran samples - run in backwards-compatible manner :
  
To run CC3 "as before" - run CS5 --CCversion CS3 --strandSpecificDuplicates
To run CC4 "as before" - run CS5 --CCversion CS4 --strandSpecificDuplicates

This re-introduces the old bugs into the new pipeline, so you can readily compare.

LIST OF ALL RELEASES

Recommended :
run CS5 with default settings.
  
CURRENT

CM5        : PARALLEL RUNS, RAINBOW VISUALISATIONS : current development version (fancy, potentially unstable)
CB5        : current development version (fancy, potentially unstable)
CS5        : current stable version

BUG-FIXED

CF5        : major bug fixes release for CB3a and CB4a pipes (released Nov2017)

OUTDATED

CB3a CB4a  : are just as CC3 and CC4 but still getting bug fixes.
CC3 CC4    : outdated, not updated any more. Contain more bugs than CB3a and CB4a.

Which version do I want to run ?

Less technical details containing guide : version quide

More technical details containing guide : versionsdetails

Check if the new development version has features you would need :

/t1-data/data/hugheslab/jelenatools/CCseqBasic/CB5/pipe.sh --help

/t1-data/data/hugheslab/jelenatools/CCseqBasic/CB5/pipe.sh -h

1) Current fixes and issues

Bug fixes and updates (all releases)

Current issues - known bugs and development targets !

Bug fixes and updates (old site)

2) Example run script

Example run script for setting up your run (serial runs - CF5, CS5, CB5 pipelines) : run.sh
Example run script for setting up your run (parallel runs - CM5 pipeline) : run.sh

Start your run in an empty folder (if you run several pipeline runs in same folder, you risk overwriting files).

3) Example run command

Example run command - if you don't like the above run file, but rather like this way better :

SERIAL RUNS ( CF5, CS5, CB5 pipelines )

/t1-data/data/hugheslab/jelenatools/CCseqBasic/CS5/pipe.sh \
    -o /t1-data/user/telenius/oligoDpnFragments.txt \
    -s C57captureTest \
    --pf /public/telenius/CAPTUREC_DATA/C57captureTest_run4 \
    --genome mm9 \
    --chunkmb 1012 \
    --R1 /t1-data/user/telenius/R1_001.fastq \
    --R2 /t1-data/user/telenius/R2_001.fastq \
    --CCversion CF5


PARALLEL RUNS ( CM5-rainbow pipeline )

/t1-data/data/hugheslab/jelenatools/CCseqBasic/CM5/pipeRainbow.sh \
    -o /t1-data/user/telenius/oligoDpnFragments.txt \
    -s C57captureTest \
    -p 4 \
    --pf /public/telenius/CAPTUREC_DATA/C57captureTest_run4 \
    --BLATforREUSEfolderPath /t1-data/wherever/you/have/this/F4_blatPloidyFilteringLog_whatever/BlatPloidyFilterRun/BLAT_PLOIDY_FILTERED_OUTPUT
    --genome mm9 \
    --chunkmb 1012 \
    --CCversion CM5 \
    --wobblyEndBinWidth 20

Parallel runs also need PIPE_fastqPaths.txt to set the fastq locations :

PIPE_fastqPaths.txt

/t1-data/user/whatever/file_read1.fastq.gz   /t1-data/user/whatever/file_read2.fastq.gz
/t1-data/user/whatever/file2_read1.fastq.gz  /t1-data/user/whatever/file2_read2.fastq.gz

or

file_read1.fastq.gz   file_read2.fastq.gz   /t1-data/user/whatever
file2_read1.fastq.gz  file2_read2.fastq.gz  /t1-data/user/whatever

More details of running CM5-rainbow in parallel

Make an empty folder, and go into that folder.

Save the above command as run.sh in that empty folder.

Make sure that there is at least ONE EMPTY LINE in the end of this file (one or more "enters" in the end)

Change user permissions for the file : chmod u=rwx run.sh

Then submit run by typing : qsub -cwd -o qsub.out -e qsub.err -N runName < ./run.sh

You can see if the job is running by typing : qstat | grep yourUsername

The data you get, will be stored in this very folder, and the data hub address you find by typing : tail qsub.out

( See the --help of the pipeline command (above) , and manuals (below) to see what the parameters -o , -s , --pf etc. mean ! )

4) Pipeline manuals

There is no real manual at the time (10/Jan/2017) - but Jelena is writing one, the manual should be out around March 2017 !
Below some already-existing documentation, however !

How to run CM5 in parallel (incomplete manual)

These are instructions to interpret output, and better understand what the pipeline run does :
Interpret and troubleshoot your results
- these instructions are a little outdated for CB4 and CC4 pipelines (red and orange graphs are not explained in the above).

These are instructions to interpret the output report counters :
Interpret your report file

These are instructions to run the steps of the pipeline one-by-one : they give a clear outline which tools form the "backbone" of the pipeline
User manual for CaptureC analysis without pipeline
- these instructions are a little outdated for CB4 and CC4 pipeline (red and orange graphs are not explained in the above) !

More documentation in this site :
Interpret your results